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Abstract. 

In large asexual populations, multiple beneficial mutations arise in the population, 
compete, interfere with each other, and accumulate on the same genome, before any 
of them fix. The resulting dynamics, although studied by many authors, is still not 
fully understood, fundamentally because the effects of fluctuations due to the small 
numbers of the fittest individuals are large even in enormous populations. In this paper, 
branching processes and various asymptotic methods for analyzing the stochastic 
dynamics are further developed and used to obtain information on fluctuations, time 
dependence, and the distributions of sizes of subpopulations, jumps in the mean fitness, 
and other properties. The focus is on the behavior of a broad class of models: those 
with a distribution of selective advantages of available beneficial mutations that falls 
off more rapidly than exponentially. For such distributions, many aspects of the 
dynamics are universal — quantitatively so for extremely large populations. On the 
most important time scale that controls coalescent properties and fluctuations of the 
speed, the dynamics is reduced to a simple stochastic model that couples the peak 
and the high-fitness "nose" of the fitness distribution. Extensions to other models and 
distributions of available mutations are discussed briefly 

Keywords: Mutational and evolutionary processes (Theory); Population dynamics 
(Theory); Models for evolution (Theory) 
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1. Introduction 

Classical population genetics primarily focussed on the dynamics and statistics of 
mutations that exist or arise in a population under the approximation that their eventual 
fate is primarily determined by their own contribution (or sometimes also those of 
a few other alleles) to the fitness of the organism. But in large populations many 
new beneficial mutations arise each generation: these compete with each other and 
further mutations occur in the descendants of mutant individuals, before the fate of 
the earlier mutations is apparent. Especially as microbial populations are usually large 
enough for these effects to be important, one would like to understand the dynamics and 
statistics of this multiple mutation regime. But even the simplest models of the effects 
of multiple beneficial mutations and the resulting population dynamics have proved to 
be surprisingly — at least to this statistical physicist — difficult to analyze. (For a 
recent review, see reference [1], and for earlier work, see, e.g. references [2, 3]. ) 

The simplest model is a large asexual population of very similar organisms 
descended very recently from a common ancestor which evolve in a fixed environment 
with the only "ecological" interactions among the organisms acting to keep the 
population approximately constant (because, e.g., of a limited supply of a primary 
nutrient or predation) with no dependence of the interactions on the genotypic 
differences. The "fitness" is then solely a function of an individual's genome: simply 
the difference between the birth and death rates with that genome. The "mean-field" 
interactions between the organisms act to make the net growth rate of a clonal sub- 
population equal to the difference between the fitness of its genome and the average 
fitness of the population. As long as this does not change by too much, only the net 
growth rate matters: one can assume, e.g., that the death rate is constant and the birth 
rate equal to the death rate plus the fitness relative to the mean. 

If the "fitness landscape" - - the fitness as a function of the genome — in the 
vicinity of the genomes of the (assumed almost identical) existing organisms has a 
great number of uphill directions accessible by single mutations, then the evolutionary 
dynamics is primarily determined by the statistical properties of the landscape. If these 
do not change much as the population evolves, then the most crucial feature of the 
landscape is the distribution of the numbers and selective advantages, s, of potentially 
beneficial mutations available. Multiplied by the corresponding mutation rates, these 
can be lumped together into a fi(s)ds, defined as the rate per individual for beneficial 
mutations in an interval (s, s + ds). Although this model is often framed in terms of an 
assumption that the effects of the mutations on the fitness are additive, it is actually 
far more general obtaining as long as the fitness landscape is static with statistically- 
uniform properties and there are no genotype-dependent interactions or other properties 
of the organisms that affect their birth or death rates. 

The effects of stochastic fluctuations in the number of births and deaths in a 
subpopulation are small when the subpopulation is large: one would thus expect that 
the dynamics in the limit of very large populations would be essentially deterministic. 
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But this is fundamentally wrong except close to a fitness peak: indeed, it quickly leads to 
absurd results. The crucial feature of evolutionary dynamics is that even in enormous 
populations, a single individual that is fitter than all the others can take over the 
population in a time that is only logarithmic in the population size. Thus rare events 

- in particular mutations that produce individuals fitter than any that already exist 

- play an essential role in the dynamics. One can analyze these by branching process 
methods, as we do herein, but analyzing the interplay between the stochastic rare events 
and the almost-deterministic but non-linear (due to the fixed population size constraint) 
dynamics of the bulk of the population is not straightforward. And average quantities 
are usually very bad characterizations of the typical behavior. This is true even in the 
presence of several helpful small parameters that are related to the very large populations 
and low beneficial mutation rates. 

The most basic question is the average rate of increase of the mean fitness of the 
population, v. In the deterministic approximation, this will increase exponentially fast 
and there will be no steady state. In contrast, for any finite N there will be a statistical 
steady state with a finite average speed. The most basic task is to understand the 
properties of this steady state. 

Many authors — particularly statistical physicists — have studied this model over 
the past decade, [1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] often focussing on 
the simplest case of = UbS(s — s): i.e. a single fitness increment. This corresponds 
most simply to a landscape that looks like a uniform staircase with equal sized steps of 
height s. But if one is interested in the diversity within the evolving population, one has 
to remember that the landscape is actually very high-dimensional with a large number 
of possible mutations for each step: U b is the total rate for all of these. [19, 20, 21] 

At this stage, many of the properties of the statistical steady state of the staircase 
model are pretty well understood. [10, 1, 12] But most of the methods used are 
somewhat tricky and rely on various Ansatz's and heuristic arguments. They are hard to 
generalize to a distribution of mutational effects, although these are the focus of "clonal 
interference" analyses [5, 10, 17] for which competition between mutations of different 
selective advantages is assumed to dominate the dynamics, and recent work by Good et 
al [12] . One of the purposes of this paper is technical: to analyze the general model with 
a distribution of mutational effects by several methods, reproducing in a more controlled 
way many of the existing results, elucidating some of the difficulties and correcting some 
errors, establishing which aspects are universal, and — it is hoped — providing means 
for analyzing other models and phenomena. But the primary goal of this paper is to 
move away from average properties and explicitly focus on fluctuations: the primary 
new results concern these. The basic strategy is to first allow the population size to 
vary and then use results from this to study the fixed-population-size model of primary 
interest. 

1.0.1. Outline: In the remainder of this introduction the model is defined and its 
heuristic behavior then discussed in Sec. (2) whose first subsection is a review of some 
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previous work. Analysis of the branching process that generates the fluctuations and 
the dynamical consequences of these are carried out in Sec. (3) and Sec. (4), which also 
discusses distributions. Technical details from these sections are relegated to appendices. 
Section (5) focusses on the dynamics on the most important time-scale: this is controlled 
by a simple effective model. Sections (6) and (7) introduce other possible approaches 
and their pros and cons. The last section summarizes the results and their limitations, 
and discusses possible extensions and future directions. 

1.1. Model and fitness distribution 

As long as one is not interested in individuals, all that matters is the dynamics of the 
fitness distribution of the population: we denote by p(x,t)dx the number of individuals 
with fitness in the interval (x, x + dx) at time t. We refer to p(x) as the "fitness 
distribution" although its integral is the total population, N(t) = / dxp(x, t) , rather 
than unity: dividing it by N(t) yields the instantaneous probability density of the fitness. 
The growth rate of a population with fitness x is 



with T(t) adjusted to keep the total population constant: T(t) must thus be (up to 
small fluctuations) equal to the mean fitness of the population at time t, 



To understand the behavior, it is useful to consider relaxing the fixed population size 
constraint: this we will do in various ways. In the last section we briefly discuss 
different forms of the dependence of the growth rate on x, but largely focus on the 
simple conventional form, Eq.(l). 

Although the argument about individuals with the highest fitness suggests that one 
can not make a continuous approximation to p, in fact one can as long as the width of the 
fitness distribution is small compared to the mean birth and death rates — which we will 
assume. But in making this continuous population size approximation, it is essential to 
include the right form of the stochastic noise that represents the fluctuations in numbers 
of births and deaths: these must be of order yjp~. This approximation is then exactly 
equivalent to the "diffusion approximation" often used in population genetics. [22] 

The dynamics of the general model we consider is given by a stochastic integro- 
differential equation of the form 



where the linear operator, £ , includes the effects of selection and mutations, t]bd 
is gaussian white noise (with the Ito convention) which, when multiplied by \^2p, 
represents the close-to-gaussian fluctuations in the difference between births and deaths, 
and T(i) can be adjusted to keep the total population, / p(x)dx, constant at N. Time is 
measured in units of the inverse death or birth rate which are approximately equal and 
roughly constant as long as the total increase in fitness is small. We consider a general 



X = x- T(f) 



(1) 




(2) 




(3) 
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distribution of mutations, /i(s)ds. Then with the linear dependence of the selection on 
x, we have 



[For derivation of the basic equation for the stochastic dynamics, see appendices of Good 
et al [12] .] 

We expect that in a large steadily evolving population with no depletion of the 
supply of beneficial mutations, the mean fitness of the population will increase steadily, 
so with N fixed, T(t) vt, with fluctuations around this. The simplest quantities of 
interest are how the mean speed, v, depends on N and on /i(s). As values of s in the tail 
of the distribution will play an important role, but many quantities will only depend 
logarithmically on /i, it is useful to define the — typically large — 



As discussed in [10] - henceforth DFG — and Good et al [12] the behavior depends 
qualitatively on whether A(s) is convex up or down: i.e. whether it increases faster 
or more slowly than linearly for large s. We will primarily focus on the former: the 
short-tailed case for which fi(s) decreases faster than exponentially: many properties of 
the short-tailed case are universal in the limit of large populations sizes, as we shall see. 
The latter, long-tailed, case is discussed in the last section — along with an argument 
that it is unlikely to be relevant for continuous evolution which is our primary interest 
here. 

Note that if T(t) were fixed independent of the stochastic fluctuations in the 
populations, the dynamics would be simply a branching process which — at least in 
principle — can be analyzed fully. But, as with all branching processes, at long times 
the population will either go extinct or grow without bound. We are thus faced with a 
choice: either to assume the speed is essentially constant and keep the total population 
fixed by some heuristic means, or to try to analyze the full problem with the population 
size fixed, or to change the model in some way so that the population size fluctuates but 
neither goes extinct nor diverges. [11] Most authors have taken the heuristic approach 
and we consider that first. 

2. Heuristics 

2.1. Staircase model 

The basic heuristics for the dynamics of the fitness distribution of the staircase model 
has been worked out in a variety of ways. [10, 16, 1] We focus solely on populations large 
enough that there are many new beneficial mutations each generation, iVC7& >> 1: the 
multiple mutations regime. This subsection reviews the heuristic arguments of DFG. 

At any given time, the high-fitness tail of the distribution of the population extends 
out to a "nose" at fitness Q above the mean: beyond this the total population is so small 
that it is very likely to quickly die out in the absence of mutations feeding it. [4] We 




(4) 



A(s) = log[l/M*)] • 



(5) 
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refer to Q as the lead of the nose. At x = x + Q there is a population large enough 
that it has established — i.e. very unlikely to die out from fluctuations — and growing 
exponentially at rate Q: this we call the lead population. The condition that this 
subpopulation has established is that its size is larger than roughly 1/Q. 

After it is has established, the lead population grows exponentially and after some 
time, T es t, will itself have produced enough beneficial mutations to further advance the 
lead by s. Thus the average speed of the nose is simply. 

v = 7"^T (6) 
[Test) 

Self-consistency requires that the average speeds of the mean and the nose be the same. 

A time t after a lead population is established, it will have grown by a factor of 
e Qt-vt 2 /2 _ e (Q-vt/2)t because growth rate slows down as Q — vt due to the advance 
of the mean. The lead population produces further beneficial mutations at rate £4 per 
individual. The probability that a new mutant will establish is roughly Q. [22] Thus the 
time to establish a new lead population is when of order 1/Q new mutants have been 
produced. The average time for the lead population to grow large enough to produce 
mutations that will establish the next lead population is thus 

r est ~ "^y (7) 

with Q the mean lead and Q — \s the average growth rate of the lead population between 
when it establishes and when the next population establishes, by which time its growth 
rate will have decreased to Q — s. Note that s rather than Q appears in the logarithm 
because of multiple mutations that establish soon enough to contribute to the new lead 
population. [16] 

The populations near the nose comprise only a tiny fraction of the population. 
Indeed, when it first establishes, the lead population has size only ~ A. If further 
mutations into the lead population can be ignored, then a time t after it is established 
this subpopulation's size is roughly exp(Qt — ^vt 2 )/Q which peaks after time Q/v at 
^ exp(Q 2 /2v ). At this point it is the largest subpopulation: its peak size is thus a simple 
estimate for the total population size. Given the sloppiness in this argument, the best 
we can hope for at this stage is logarithmic accuracy: 

log(7VQ) « I* . (8) 

The time it takes for the lead population to grow until it becomes the largest 
subpopulation is roughly analogous to the (half) sweep time for a single mutation in a 
small population, thus we call it — rather loosely — the "sweep-time" : 

r sw « Q/v . (9) 

This is the time delay between the dynamics of the nose and the consequences of these 
for the bulk of the distribution: it is thus a particularly important time scale. 
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From the simple arguments above, one obtains 

v » ~s 2 |^[21og(iV5) - \og{s/U b )] + y\2\og{Ns) - log(s /U b )] 2 - log 2 (§M)} 

When log(iVs) is substantially larger than log(s/U b ), the regime on which we will focus, 
then the basic result of DFG follows: 

?[2log(Ns)-log(s/U b )] 
log 2 (~s/U b ) 

As we shall see, this result turns out to be true much more generally than the staircase 
model, [10, 12] provided the appropriate effective s and U b are used. 

The gaussian time-dependence of a sub-population with any fixed fitness implies 
that the the fitness distribution is a sum of sharp peaks with a close-to-gaussian envelope: 

p(x, t) ~ £ N5(x - ks)e' [x - r(t)]2/2v (12) 

k 

with Tsii. This is illustrated schematically in figure 1. 

2.1.1. Large parameters and asymptotic simplifications: The crucial simplification that 
enables both heuristic and systematic analyses to be valid is the separation, at any 
fixed time, between the strongly fluctuating subpopulations near the nose and the 
subpopulations near the mean fitness which dominate the total population size. For the 
former, non-linearities needed to keep the population fixed do not matter, while for the 
latter, stochastic fluctuations from births and deaths do not matter. In between, neither 
fluctuations nor non-linearities matter: the established subpopulations grow essentially 
deterministically at a gradually slowing down rate (due to the advancing T(i)), but they 
are still small enough that they do not contribute substantially to the total population. 
This is thus a classic case for which one can hope that asymptotic methods should 
be good. Near the nose one can use branching process methods; near the peak of 
the population distribution one can fix the total population from the deterministically 
growing subpopulations; and in between both approximations should be good. Thus, 
at least in principle, one should be able to develop separately asymptotic expansions 
for the nose and for the dynamics of the mean and then match them together in the 
intermediate regime. 

The basic large parameter that enables the nose and the bulk of the fitness 
distribution to be treated separately is the large population size, more precisely 

L = log(JVs). (13) 

However as the beneficial mutation rate is usually small as well (except perhaps in some 
viral populations) the parameter 

e = io g (§M) (14) 

is also substantially large than unity. Here U b is the effective mutation rate of 
the predominant beneficial mutations that drive the evolution: these have selective 
advantage near to s. The dependence of s and U b on /i(s) will be analyzed below. 
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Figure 1. Schematic of the fitness distribution of a steadily evolving population in 
the huge population, high speed, regime for the staircase model. The sizes, n, of 
the subpopulations as a function of their fitness, x, are shown on a logarithmic scale 
by vertical bars. Mutations from the fittest subpopulation will establish a new lead 
population that is fitter by a fixed amount s, as indicated by the rightmost red arrow. 
This increases the fitness of the nose of the distribution, which leads the mean, x, by 
Q. After a time delay of t sw — Q, the former lead population will have risen to be the 
largest subpopulation thereby driving the dynamics of the mean. The envelope of the 
fitness distribution (blue) is close to a gaussian cutoff at x + Q: this envelope moves 
at average speed v equal to the rate of advance of the nose. 

The red arrows indicate a series of mutations that could lead to the lineage from 
which they arose taking over the population: this is unlikely to occur except from a 
fitness of order (vs) 1 ^ 3 or less from the nose. This and several of the other important 
fitness scales are indicated. In order of decreasing magnitude: the lead, Q, the standard 
deviation of the fitness distribution, y/v, (vs) 1 / 3 , and the fitness increment, s. 



Asexual Evolution Waves: Fluctuations and Universality 



9 



An important quantity is the ratio between the typical time for the nose to advance, 
r est , and the time for the lead population to become the largest population: the sweep 
time t sw . The ratio of these, which we denote q, is the typical number of mutations 
above the mean that are needed to establish the nose. If the distribution of strengths of 
the predominant mutations is narrow, then q ps Q/s. When the population is not too 
large — log(iVs) only a small multiple of log(s/C/&)) — or if there is a long tail to the 
distribution of beneficial mutations, [J,(s), q will be small and will fluctuate (as we discuss 
later). But in very large populations, especially with high mutation rates such as occur 
in mutator populations (so that log(s/C/&) is not large), q can become substantial: \jq 
is then a useful small parameter. The self-consistency condition between the assumed 
speed of the mean and the speed of the nose yields for the staircase model, 



interest is consistent with our other approximations). The condition for the population 
to be in the multiple mutations regime is that NUb ^> 1, equivalent to L > £ so 
that q > 2. For q to be truly large requires L ^> I: much larger populations. Thus 
asymptotic large-population-size behavior will be approached very slowly. Nevertheless, 
the qualitative, and many of the quantitative, features are correctly captured by the 
behavior in the large q limit. 

The behavior further simplifies if the population is large enough that v/s 2 qji 
2L/£ 2 is not too small: the mean fitness then increases relatively steadily because many 
subpopulations contribute substantially to it at any given time. This can be seen 
by summing over the subpopulations from Eq.(12): for v ^> s 2 the sum can be well 
approximated by an integral. In practice, factors of n help and even for v ~ 5 2 /5, 
the instantaneous speed, % ~ does not fluctuate much on time scales of order 

r est — s/v. But we shall see later that when v > s 2 , small variations in the timing of the 
establishments are important on longer time scales than r est and these yield fluctuations 
of the speed that are of order the average speed. 

We call the limit L>f 2 , for which v ^> s 2 , the huge population size, or high speed, 
regime. The regime in which the population is not large enough for the mean fitness 
to increase smoothly on time scales of the establishments of new lead populations, but 
still large enough that multiple mutations accumulate before any fix, we call the modest 
population size or modest speed regime. This obtains for £ < L < £ 2 for which v < s 2 . 

2.2. Instability and fluctuations of the nose 

A crucial aspect of the dynamics — and the root cause of the difficulties in analyzing the 
model — is that the nose is unstable. If a fluctuation causes the nose to advance faster 
than average, then Q will increase so that the lead population will grow at a faster rate 
and the next population be established sooner, causing Q to increase further. Similarly, 
if Q decreases, the slower growth of the lead population will cause it to decrease further. 




(15) 



(or more precisely, q ~ (L + \Jh 2 — iV) j £, but neglecting the difference in the regimes of 
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On time scales longer than the time between establishments, r est , but shorter than the 
sweep time, t sw , the heuristic arguments given above imply that 

^ ps e(Q - Q) + fluctuations (16) 

with Q(v) the mean lead of the nose at mean speed v and e the growth rate of fluctuations 
in Q. A time t sw later, when the former lead population has grown to be the largest 
subpopulation, the fluctuations of the nose would affect the total population size so 
that T(t) has to be adjusted to keep N fixed: the nose fluctuations therby drive the 
fluctuations of the whole fitness distribution. For the constant s model with beneficial 
mutation rate C/ 6 , a simple calculation yields the eigenvalue, e, of the instability of the 
nose: if Q increases due to a fluctuation, the speed will increase proportionally. Thus 
at fixed speed of the mean of the population, the dynamics of the lead is 

sQ „ Q-Q Q-Q 

= v ps v — - — ~ 17 

log(s/U b ) Q r s , 




I sw 



therefore 



e ps — (18) 

Tsui 



(where we have here neglected differences between Q and Q — 

For large populations the sweep time is the most important time scale for the 
dynamics. That the time scale of the nose instability, 1/e, is similar, is the cause of 
many of the difficulties in the analysis: there is a time delay but no separation of time 
scales between the fluctuations of the nose and the needed adjustments of the mean to 
keep the population fixed. [In the last section we will discuss models for which there is 
such a separation of time scales.] 



2.3. Predominant mutations with short-tailed p(s) 

With a distribution of the strengths of the selective advantages, a heuristic argument for 
the relative contributions of the various s can be made in terms of their contributions to 
establish a new lead population with lead Q. The rate of mutations to Q from a fitness 
distribution p(x) is / dsp(s)p(Q — s). If the envelope of the fitness distribution has a 
similar shape near its nose to that of the staircase model (see figure 1), then we expect 
it to rise roughly exponentially for some range of x behind the nose: p(x) ~ e^~ x ) 
with 7 ps — |q its log-slope which in the staircase model is 7 ps 2 This suggests 
that the weighted effects of the mutations of different strengths will involve 

J dsp(s)e^ s . (19) 

This is reasonable if fi(s) decays slower than exponentially — the short-tailed case - 
for which the integral of fi(s)p(Q — s) will be dominated by \ near Q. Furthermore, the 
exponential weighting, Eq.(19, suggests that there will be a relatively narrow range near 
to some s, the predominant s that we denote s, that dominate the evolutionary dynamics: 
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we shall see that this is indeed correct. [10, 12] The predominant s is determined by 
the maximum of the integral in Eq.(19) and is thus related to 7 by 

d log a. , . 

In the long-tailed case with p(s) decaying more slowly than exponentially, the 
exponentially weighted integral over s is divergent so that the downwards curvature 
of logp(x) must be taken into account: the behavior, even for very large populations, 
is then somewhat different as we discuss in the last section. [10, 12] 

3. Branching Process at Fixed Speed 

For general p(s), and even for the staircase model with \x = US(s-s), if one is interested 
in fluctuations, distributions of sizes of subpopulations, dynamics, etc., analyzing the 
behavior near the nose is subtle even though formally it can be done exactly. [Good 
et al [12] also carry out a branching process analysis: the approach we take here is 
somewhat different although some of it parallels theirs closely as discussed at the end of 
this section.] We first give the general branching process analysis in the approximation 
of fixed speed — allowing the total population size, N(t), to fluctuate around some 
typical value that is of order the fixed population size of interest. We then use this to 
derive various results that can be "matched" on to analysis of variations of the mean 
fitness that are needed to keep the population size fixed — compensating for the delayed 
effects of the nose fluctuations that occurred a time r sw earlier. 
It is convenient to go to the moving frame and define 

X = a; -T(t) with T = vt (21) 

with the associated linear operator 

C v = Cq + v^-, (22) 

the second term coming from the shift to the moving frame and C defined in Eq. (4) . To 
compute any quantities of interest at time T, one can use the generating function 



Z T ({(}) = (exp 



(23) 



where the average is over the stochastic fluctuations from some initial time to time T 
and, if the initial conditions are not fixed, also over the distribution of these. 

With the branching process dynamics — either for discrete integer numbers of 
individuals in subpopulations or for the continuous p (diffusion) approximation we use 
— the generating function at time T can be related to that at T—dt by averaging over the 
fluctuations in the dt time interval. This can be seen as follows: With the continuous- 
population-size stochastic dynamics of p{x,t), the noise term in p(x,T) — p(x,T — dt) 
is proportional to \fdt. Expand the generating function to order dt and then average 
over the noise in the dt time interval and reexponentiate. Second order in the noise 
times \fd~i yields a term ( 2 p. And first order in the rest yields a linear term from the 
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deterministic part of the dynamics: this has the form / dx((x)£vp(x, T — dt). The latter 
can be rewritten in terms of the adjoint operator, acting on £ to yield an exponent 
of the generating function of the same form but in terms of the fitness distribution at 
time T — dt and C(x) replaced by a function 0(x). 

By iteration one thereby obtains the generating function at time T in terms of the 
generating function at any earlier time, t < T: 



The dynamics can be integrated back to the initial time where the generating 
function is found simply in terms of the initial fitness distribution, p(x, 0) (or more 
generally averages of this): 



Note that the backwards equation is analogous to using the method of characteristics 
to solve the Laplace transform of the multi-(or infinite-) dimensional Fokker Planck 
equation for the joint distribution of {p(x,T)}. If one is interested in quantities at 
more than one time, then one can generalize the exponent of the generating function to 
— Jo dt J dxd(x,t)p{x,t) with the one-time case equivalent to 0(x,t) = 5(t — T)((x). 
The equation for Eq.(25), then has an additional term: +9(x,t). 

The key quantities will turn out to be the fixed point, 0*, of the branching process 
equation, C\<p — 2 = and the lowest eigenfunction, ip(x), of the linearized operator 
about this fixed point which has eigenvalue — e with e > corresponding to the fixed 
point being stable. As we shall see, this eigenvalue is controlled by the dynamics of the 
nose with e the mean growth rate of deviations of the lead, Q(t), from its average value 
in steady state, as discussed above. The other — trivial — fixed point, <f>(x) = for all 
X, is unstable: 0* is a global attractor. 

Note that integer-population-size models can be analyzed similarly: the only 
difference is that — d(f>/ dt has a different non-linear term which i is periodic in — > (p+2m 
because the population density, p(x), is a series of S functions with integer weights. The 
detailed form of the non-linear term depends on the distribution of numbers of offspring, 
etc., but for small 0, is proportional to 2 . All of the universal features are dominated 
by this small behavior as long as the range of selective advantages or disadvantages 
of the population is small compared to the birth and death rates at all times: i.e. the 
conditions under which the continuous (diffusion) approximation is good. 




with "initial" — i.e. final — condition 





(28) 
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3.0.1. Fixation probability: Because of the instability of the branching process, at long 
times N(t) = / d\ pix-i wm either be zero or very large and diverging with t. This 
is reflected in the backwards-in-time convergence of <fi to 0* for almost all "initial" 
conditions C(x)- The fixed point, 0*, of the backwards time equation for 0, is simply 
related to the fixation probability of an "individual" with fitness relative to the mean 
X- More precisely, given p(x,t = 0), 

niim N(t) < oo] = lim (exp [-(N(t))}) = (exp(- f d XP ( X , O)0*(x)\(29) 

for any positive £. Thus for small p(x, 0), 

V[N(oo) = oo] « y dxp(x,O)0*( X ). (30) 

Information on the distribution of N(t) from the generating function: If initially 
p(x, 0) is typical of the steady-state traveling wave, then we expect that for large t the 
distribution of N(t) will be very broad as its value is roughly determined by the time 
at which the nose goes unstable after which N(t) will either grow or shrink in a roughly 
deterministic manner. For a broad distribution the generating function has a simple 
interpretation for positive (: 

(exp(-TVC)) » V[N < 1/C] (31) 

as the contributions to the generating function from almost all N > ( are very small, 
while those from almost all N < ( are approximately unity. Even when the distribution 
is not so broad, this is a useful approximation and the inverse of the value of ( at which 
the generating function is 1/e (or some other 0(1) value) can — at least in some sense 
- be considered a typical value of N. [Note that we use ( here as the generating 
function variable: this corresponds to C(x) being independent of x] 



3.0.3. Pathologies of averages of fitness distributions: For branching process generally, 
and especially for faster-than-exponentially unstable ones as in the present case, averages 
of population sizes are very poor characterizations of their typical values. Averages 
of p(x, t) are given by the linear term in the expansion of the generating function in 
powers of C(x) : the results are, of course, the same as obtained by directly averaging the 
equation for ^ — j. e . the deterministic approximation. By use of Laplace transforms, 
the dynamics of the average t)) can be analyzed as carried out in Appendix A. 
The mutation spectrum enters the results via its Laplace transform: 

M (X) = J dsfi(s) [e Xs -l] (32) 

(with the mutation-out term subtracted so that Mq(0) = 0). In particular, the time 
dependence involves Mo(\ + t), Eq.(A.3). 

Without mutations an initial gaussian p(x, 0) ~ N(0) exp(— x 2 /2f ) j\j2ixv would 
give {p(x, t)) independent of time. But with a truncated gaussian p(x, 0) cutoff at a nose 
with fitness x = Q above the mean — i.e. p(x, 0) ~ N(0)Q(Q — x) exp(— x 2 /2f ) / \J1~kv 
- the behavior after a time t sw = Q/v is dominated by descendants of the lead 
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population. Beyond this time the population would decay as exp(— v(t — r sw ) 2 /2) in 
the absence of mutations. 

With mutations, the long-time behavior of the average fitness distribution is very 
bad. For truncated gaussian initial conditions, the effects of the mutations on the total 
population size are small initially but grow rapidly for t > t M log[l/ [/,(sm)]/sm with 
sm a characteristic s that dominates M.$(\ = £m)- Beyond time tM, the mutational term 
grows exponentially or faster in time as does the position of the peak of the distribution, 
(p(x))- Concomitantly, the average total population size grows as a double exponential 
of time. 



3.0.4- Mutation spectrum, bounds, and dominant range of s: In general, the typical 
population will grow less rapidly than the mean. In Appendix A it is shown that this 
results in a bound for the (average) lead: 

Q > X = ^v-M ^) (33) 

where 7 is the value of A at which Mi = dMo/d\ = v. i.e. 

^1(7) = ^°(%) = / dsfi{s)se^ s = v . (34) 



d\ 

As we shall see, for short-tailed fi(s) for which Ai (\) is finite for all A, Q is close 
to x an d the contributions of the spectrum of s to the speed are close to those given 
by their weight in the condition .Mi (7) = v, i.e. weighted by e^ s . This is close to the 
condition anticipated from heuristic arguments given above which correctly concluded 
that the values of s that dominate for short-tailed n(s) are centered around the value 
s, at which exp(7s)/i(s) is maximum. The quantity 7 is close to the slope of a typical 
logp(x) at Q that appeared in the heuristic argument, justifying the notation used 
there. With the beneficial mutation rate small, as we are assuming, 

7§ > 1. (35) 

A pretty good approximation for short-tailed distributions is to replace /i(s) by UbS(s-s) 
with 

U b = M (l)e- rs : (36) 

this is close to the "predominant-s" approximation of DFG. Unless the curvature of 
A(s) = — log fi(s) is anomalously small, the range of s around s that substantially 
impact v is small compared to s, being of order 1 / ^Jd 2 A/ds 2 \s which is typically much 
less than s for 7s ^> 1. 

For example, if the distribution of beneficial mutations is a half-gaussian: fi(s) = 
2U/V2na 2 exp(-s 2 /2(T 2 ), 7 w ^W^ja and s w V^a, with £ = \og(a/U). But the 
range of s that dominate are within a of s since d 2 A/ds 2 = I /a 2 . The effective mutation 
rate to the predominant-range mutations is Ub ~ U 2 /a (so that £ = log(s/C/&) = 2£ ) 
which can be a tiny fraction of the total beneficial mutation rate, U . In terms of Ub, 
the range of s that dominates is narrower than s by a factor of 1/ \J\og{s/Ub). 



Asexual Evolution Waves: Fluctuations and Universality 



15 



3.0.5. Formal steady state p(x) : Formally, there exists a steady state solution to the 
average equation analyzed above, i.e., a fixed point p*(x) satisfying C v p* = as found 
in Appendix A.l. The function p*(x) is well-behaved for x less than some value - 
which turns out to be very close to Q — beyond which it becomes negative and starts 
to oscillate, while continuing to decay rapidly, as x increases. But a truncated p* 
restricted to the range x < Q, i- e - 



is essentially what many authors have analyzed as the "average" distribution, it is at 
best only a pseudo-average — closer to a logarithmically weighted average, exp[(logp)] 
- and very far from the actual average of p(x) which is dominated by rare fluctuations 
as we shall show in Sec. (5.3). 

3.1. Fixation probability at fixed speed 

The fixed point, (f>*(x), of the backwards in time equation for T) obeys 



with Cl<p given by Eq.(27). The fixed point, shown on a logarithmic scale in red in 
figure 2, has some simple features and some rather more subtle ones. 

High fitness limit: At large Xi the dominant balance is between the x$ an d 4> 2 terms 
so that Ri Xt the fixation probability of individuals that are substantially ahead of the 
average lead: this rarely occurs but when it does, the fixation probability is just the 
establishment probability as the future population will be almost entirely descendants 
of any such anomalously fit mutant that establishes and starts to grow deterministically. 
This is a consequence of the instability of the nose: if Q is substantially ahead of the 
typical lead, it will grow even faster giving the rest of the population even less chance 
to produce mutations quickly enough to catch up. 

Linear regime: For x substantially below some Q we expect that there is little 
chance that individuals will produce enough mutants to reach the lead and have a 
chance at taking over the population. Thus <p should be very small. As linear behavior 
in the dynamics of <fi corresponds to deterministic behavior in the dynamics of p(x), 
we expect this to obtain except near to the nose. Thus the simple — and correct - 
conjecture is that non-linearities in 0* will be unimportant except close to and above 
the value of x that roughly corresponds to the average lead of the nose, Q. 

Shoulder: Near to Q we expect a crossover between the fluctuation-dominated and 
deterministic regimes: i.e. from saturated to linear behavior of <fi. In the saturation 
regime the effects of mutations are small and we expect this to hold for at least some 
range below the shoulder at Q as the effects of the mutations from x + s > Q will 
be modest until </>(x) has dropped substantially below Q. If we can ignore both the 




£t0* _ 0*2 = 



(38) 
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Figure 2. Schematic of the backwards-time dynamics of the function <j>{x,t) that 
determines how the size of the population at time T depends on the fitness distribution 
at earlier times, t < T, with the speed of the mean of the fitness distribution fixed 
at v. Note the logarithmic scale for <j). The dynamics are shown in the moving frame 
with x — x — vt. The generating function (e~'> N ( T ^ is obtained by starting at t = T 
with <f> = C (horizontal line) and integrating backwards which results in log <fi becoming 
steeper with decreasing t until t = T — t sw (blue line). For x larger than a sharp 
shoulder (of width ~) whose position is denoted P(T — t),<fi saturates at <j> ps x which 
is almost horizontal on this logarithmic scale. For T — t > t sw , (f> rapidly attains its 
asymptotic shape which then moves slowly with P(T — t) converging exponentially to 
Q and <fr to its fixed point, 4>*(x) (red). The green arrow indicates where the most 
important mutations that advance the nose of the fitness distribution come from: these 
cause the slope of log <j> near the shoulder to saturate at 7 w t sw . 

The value of £ shown is much less than the inverse of the typical N, N: the 
generating function then gives information on the large N tail of its distribution. At 
fixed population size, such fluctuations drive fluctuations of the speed of the mean 
fitness. For £ > 1/N, the blue curve would be to the left of the red curve and the final 
approach to the fixed point would be from P < Q. 
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mutations and the difference between x an d Q in this shoulder regime, then we have 
simply 

~i + exp(0( x -g)/,) (39) 

which satisfies the simple equation, —vd(j)*/dx + Q4>* — 4>* 2 = (which is linear 
substantially below Q). Note that for convenience we have chosen the mid-point of 
the crossover to define Q. 

Multiple scales: The scale over which the crossover occurs is simply v/Q: the 
amount the mean fitness increases in a time that is the inverse of the growth rate of 
the lead population (which is also the typical time for new mutants that advance the 
lead to escape the fluctuation regime and become established). Note that this fitness 
scale is <C Q for large populations, and <C s for small mutation rates. Thus we have 
at least four scales of fitnesses: Q, ^Jv - the width of the population distribution — , s, 
and v/Q with Q S> \/y&s ^ v/Q. The inverses of these correspond, respectively, to 
the time for the nose population to grow by a factor of e, the time for the mean fitness 
to increase by the standard deviation of the fitness distribution, the range of times 
over which most of the mutants that advance the nose by one step are established, 
and the time, t sw , for the nose population to rise to be the largest population. For 
huge populations for which y/v ^> s, we will see that there is generally a fifth scale, 
of order (t>s)» which is intermediate between y/v and ^> s. Finally, except for the 
staircase model, there is a sixth scale, between v/Q and s, that is the range of s around 
s that contribute substantially to the dynamics, as discussed earlier. Several of the most 
significant fitness scales are illustrated in figure 1. 



3.1.1. Well-behaved solution: If the effects of mutations could be ignored for x we U 
below Q, the only solution in this regime would be simply exp[x 2 /2f] which is clearly 
nonsense and it can be checked that the effects of the mutational term cannot save 
it. But this observation suggests how Q will be determined. With only beneficial 
mutations, the fixed point equation can be integrated down from large x, starting with 
the general well-behaved solution in that regime: x ~ A exp(— x 2 /2t>), with A an 
arbitrary constant. This is obtained by linearizing around the ~ x limiting behavior 
and ignoring the effects of mutations which can be justified and treated perturbatively 
in this regime. The value of A will determine Q, defined roughly as the point at which 
the solution substantially deviates from <fi x so that it matches onto the solution in 
the shoulder regime derived above. For small A, the behavior in the linear regime below 
the shoulder will be dominated by the badly behaved upside-down gaussian solution. In 
contrast, for large A, it can be seen that will go negative and diverge to — oo. Adjusting 
A to find a well-behaved solution is thus like solving an eigenvalue problem by shooting 
from one end and adjusting a freely chosen coefficient to find the well-behaved solution 
(or solutions) at the other end. Thus we need to find a solution that is well-behaved in 
the linear regime. The detailed analysis is carried out in Appendix B. 



Asexual Evolution Waves: Fluctuations and Universality 



18 



Log-slope approximation: The most salient features in the linear regime are well 
captured by a WKB-like approximation. Writing 



<t> = Q exp 



7(xW 



(40) 



and assuming that the log-slope 
_ <91og0 

7(X) = (41) 
is not too rapidly varying with x, yields 

<f)(x + s) « 0(x) exp[s 7 (x)] (42) 

so that the mutational term is approximately -Mo[(7(x)]0(x)- 

The linearized fixed point equation in the log-slope approximation simply becomes: 

v-f - M (-f) « x (43) 

with x as a function of 7 obtained by integration. The well-behaved solution is that for 
which 7 — )■ 00 as x - >■ — 00: the corresponding is manifestly less than the "generic" 
badly-behaved upside-down gaussian solution which corresponds to 7 = x/ v - 

The WKB solution cannot be correct for x > X — v l ~ -Mo (7) at which ^(7) from 
Eq.(43) is maximum. An expansion around the WKB approximation suggests that it 
breaks down when 

X-X~ [^2(7)]* « (us) 3 (44) 

with M.i{l) = ■ For huge populations for which v 3> s 2 , this is a distance of more 
than s from x and the behavior changes over a substantial range of x- But for more 
modest population sizes with smaller v, the scale s is less than (us) 3 and higher order 
terms make the saddle point approximation break down for x — X °f order s: the scale at 
which the discreteness of the typical jumps, and corresponding fine-structure in <f>*(x), 
occurs. 

3.1.2. Moderate speeds, t)<I 2 ; To find the fixed point, 0*, we are faced with matching 
the linear solution(s) to the non-linear solution in the region just below Q. For small 
v/s 2 , a simple argument leads to the correct behavior. With Q not much different from 
X and Q — x substantially less than s, the corrections to the shoulder solution from 
mutations are not large. Specifically, the contributions to the fixed point equation from 
mutations are well approximated by the input from the non-linear region: Q Jq_ x /x(s)ds. 
As shown in Appendix B.l, this input is only important for x near Q— s and the condition 
that the solution matches onto the well-behaved linear solution yields 



Q w + l-s + O 



V 



1 



log(S7u) ^x + -~s + -\og{s 2 /v) 



2 



v 



s 



(45) 



_s 

(Note that the A^o(7) i n X is °f order v/s and thus smaller than the neglected correction 
terms.) The observation that the behavior for all x < Q is controlled by the behavior 
of the last mutational step below Q corresponds exactly with the approximations made 
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in DFG for the staircase model of ignoring the mutational input and fluctuations for 
all but the lead population. One important note, however, is that in this regime with 
v/s 2 small, there will be large oscillations in N(t) at fixed v caused by the close-to- 
periodic occurrence of the predominant mutations that control the dynamics of the 
lead. Correspondingly, there will be oscillations in j(x), the log-slope of 0, with period 
s — but their magnitude is small compared to the non-oscillatory part and decays more 
rapidly as x decreases. The corresponding oscillations in <f>(x) are, however, not small, 
being similar to those in N(t). 



3.1.3. High speeds, v ^> s 2 : For huge populations with v ^> s 2 , the saddle point 
approximation breaks down much further than s below Q. But (in contrast to the 
moderate speed regime) the behavior below Q is now smooth and one can expand in 
derivatives. It is convenient to pull out the rapidly varying factor, e^ x , and Q, defining 

(k P -i{Q-x) 

(«) 

so that the linearized fixed point equation becomes 

df 



= 



dx 



X)f + Jdsv(sy s [f(x + s)-f(x) 



>9V 

dx 2 



+ (x-x)f + o 



dx 3 



vs 



with the fitness scale determined by the second derivative term 



1 d 2 M , 

2 d\ 2 



(7) 



vs 
~2 



(47) 
(48) 

(49) 



This is Airy's equation with the condition that the solution is well behaved for x ~ X 
large mandating the Ai solution. Matching this onto the shoulder solution and the 
mutational effects near the shoulder indicate that the non-linearities become important 
only very close to the first negative zero of the Airy function which occurs when its 
argument is —a-]_. Analysis in Appendix B.2 yields 



Q « v^f - - + ai (vs/2)s + 0(1/7) 



(50) 



with the correction term of order the width of the shoulder region which is much smaller 
than s. 



3.2. Sharp shoulder approximation: 

In the limit in which the width of the crossover from linear to non-linear is much 
smaller than the other relevant scales — particularly s — we can treat the shoulder 
approximately as a sharp boundary at Q with the linear approximation taken all the 
way up to Q: this is valid with log(s/C/&) large. The inferred value of 0* at x — Q 
should be simply Q as would be obtained from extrapolation of the linear region of the 
shoulder solution. Using this, along with the appropriate conditions at x — Q, we can 
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find all the properties of 0* discussed above — and much more — far more directly. 
The approximate fixed point equation on the interval (—00, Q) is 



with the second term the input from the non-linear saturation regime (ignoring the 
difference between 0* ~ x and 0* « Q which is readily justified since Q » s). This can 
be solved formally by Laplace transforms as carried out in Appendix C. The condition 
for a well-behaved solution that determines Q is a "solvability" condition as occurs in 
many problems: it enforces the condition that the "projection" onto the badly behaved 
solution is zero. Since the linear operator is the adjoint of the operator that determines 
the formal fixed point of the average fitness distribution, p*, it is not surprising that the 
condition can be written in terms of the truncated, well-behaved, form of this: 



Eq. (52) is simply that the total rate of mutations to fitnesses higher than the lead is 
"typically" just the right amount needed to advance the nose at speed v — similar to 
what one might expect from the heuristic argument for the stochastic dynamics of the 
nose. 

The results quoted above that are derived in the Appendix by matching together 
the various regimes, can be obtained directly and far more efficiently from this 
solvability condition. Furthermore, as discussed in the last section, the sharp shoulder 
approximation is valid in many circumstances and can be used to quickly derive results 
in very different regimes than those analyzed here. In particular, Eq.(52) is applicable 
for obtaining 0* even if fj,(s) decays more slowly than exponentially for large s. 

3. 3. Fixation probability and determination of N 

The fixed point of the branching process does not provide any direct information about 
the N that corresponds to the assumed average speed, v in the fixed population size 
model. A natural approximation is to use the interpretation of 0*(x) as the fixation 
probability of individuals with fitness \ above the mean and use the fact that the 
descendants of one individual will eventually takeover the population. Thus the average 
fixation probability should be unity so that / dx<t>* (x) p(x) = ^(1) which fixes N as 
a function of v up to a factor of order unity, [11] hence v(N) much more accurately 
because of the dependence of v on log N. But it is not clear what p(x) should be used: 
as we shall see later, the average is certainly not correct. A natural guess is to use the 
same p that yielded the condition for Q(v): i.e. the formal fixed point truncated at Q 
leading to the conjecture that 




(51) 





(53) 



As we shall show, this integral is dominated by a narrow range just below Q. 
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It is not at all clear at this stage whether fluctuations in the speed induced by 
fluctuations of the nose and/or fluctuations in p(x,t), will invalidate the conjecture 
Eq.(53). We thus must analyze the fluctuations, which we do in the next section. 

3.4- Comparison with results of Good et al 

Good et al [12] have analyzed the fixation probability at fixed speed and found some of 
the same results discussed above, with a few differences. They work directly with the 
self-consistent equation for the fixation probability, which they denote w(x) (denoting 
by their x our x). This is readily derived from the observation that for the descendants 
of an individual not to fix, none of its descendants produced by mutation can fix. With 
mutations occurring continuously in time and populations approximated as continuous, 
this yields C)w — w 2 . Thus w = <fi*. 

The first three properties of 0* listed in Sec. 3.1 were obtained by Good et al. But 
they did not analyze correctly the behavior of 0* for x < Q'- they approximate it by the 
badly-behaved upside-down gaussian solution, 0* Qexp[(x 2 — Q 2 )/2t>]] for < x < Q, 
and zero for x < 0. If one uses the condition Eq.(53) to fix iV — the most natural 
condition in the fixation probability framework — then this approximation for 0* would 
imply that the distribution of fitnesses of the individuals whose descendants take over 
the population would be uniform on (0, Q) in contrast to the sharply peaked distribution 
just below Q that actually obtains as we show below. However, they primarily use a 
condition involving the fixation probability of new mutants, [11] 



which does not suffer from this pathology. They show, via comparisons with simulations, 
that this leads to good results in the population-size regimes that are quite far from the 
asymptotic behaviors on which we focus in this paper. 

For the fitness distribution, p(x), Good et al make a simple gaussian approximation 
which is good except in the Airy regime near the nose that obtains in the high-speed 
regime: while this should affect factors inside the logarithm in the relationship between 
log N and v, this is not a large effect. An exception for which the gaussian approximation 
for p is not good, is if deleterious mutations play a substantial role: we also have not 
analyzed this case (but see reference [23]). 

With long-tailed distributions, which we do not consider in detail here, the condition 
for the weighting of s that determines s and the predominant range around it is different 
than the simple exponential weighting of Eq.(34) which is non-sensical in this case. Good 
et al's analysis of the weighting includes the effects of the downward curvature of the 
fitness distribution from which the mutations come: although this is implicit in our 
general expressions that determine Q and hence 0*, such as Eq. (52), we have not 
analyzed this regime in detail. 




(54) 
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4. Dynamics and Distribution of N(t) at Fixed Speed 

From the branching process methods used above, we can analyze the fluctuations of N(t) 
at fixed v: these we can then use to infer the fluctuations of v(t) and other quantities 
at fixed N. We also consider distributions of other quantities, particularly the effective 
population size of the nose and its temporal fluctuations. Of primary interest is the 
typical, rather than average, behavior. 

We are interested in the distribution of N(T) given the distribution of p(x,t = 0). 
We thus compute the generating function: 



as a function of ( (here independent of x as N = J d>xp(x))- Although this generating 
function — simply the Laplace transform of the the distribution of N(T) — can only be 
inverted by continuing ( into the complex plane, as we shall see much information can 
be inferred from the behavior for real positive (. The relevant ( of interest are those of 
order the inverse of a typical population size, call this N, which we want to be of order 
the actual population size when it is held fixed. 

Defining t — T — r, the backward time dynamics of are given by |^ = £j,0 — 2 
with initial condition </>(%, r = 0) = ( independent of x with ( very small. The dynamics 
are illustrated schematically in figure 2. Initially, the log-slope of will be small enough 
that the mutational terms can be neglected. In the linear-regime, we then have 



P(t) is the time-dependent position of the shoulder, the value of x above which 
saturates at ~ x- As the log-slope in the linear regime is simply r, the mutational 
term will start to become important when r pa 7. This is close to, as could have 
been anticipated, the sweep time r sw after which the total population size is affected by 
fluctuations of the nose. Concretely, the subpopulation in the lead at time T—t sw will be 
the largest population at time T while up until this time the dynamics will be dominated 
by the almost deterministic growth of the already established subpopulations and the 
stochastic behavior of the advance of the nose via newly established subpopulations will 
not yet affect the total population size. This suggests that for T < t sw , the nonlinearities 
in |^ which reflect the fluctuations, should not much affect J 0(x, 0)p(x, 0): i.e. that 
with ( ~ 1/N the shoulder P(r), should, for r < t SW) be larger than Q in the regime 
where p(x) is close to zero. 

With Q t>7, there is one particular value of £, ( ~ Qe~® 2 / 2v at which 
P ( r = 7) ~ Q an d for this value the important parts of — the position of the 
shoulder and log-slope just below it — are quite close to the fixed point 0* already 
at t pa 7. The inverse of ( is a first approximation to the typical population size: 
iV ~ i ~ e® 2 / 2v /Q as from the heuristic arguments. 




(55) 




(56) 



(57) 
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The dynamics of 4>(x, T — r) under the full linear operator can be analyzed generally 
by Laplace transforms and approximately by the log-slope approximation. This must 
then be matched to the moving shoulder at P(r) to determine its dynamics self- 
consistently, as carried out in Appendix D. The approximation of ignoring mutations up 
until r pa r sw is shown to be rather good and after that time the most important effect 
of mutations is to shape and move the shoulder region centered at P(r). At t ^ t SW) 
the log-slope of at P(r sw ) is close to 7 so that has a shape similar to 0* — above, 
through, and to some distance below the shoulder — but it is shifted by P(t sw ) — Q. 
Below the shoulder the log-slope continues to increase for some time after t sw and 
drops concomitantly, but is prevented from decreasing too far by the mutations from 
higher x which quickly stabilize at close to the same shape as 0*. Thus on a time 
scale (asymptotically) much shorter than t sw , converges from its form at r = t sw to 
one of a family of functions, {0p(x)}, parametrized by the position of the shoulder, 
P. As long as P is relatively close to Q, the shape of all the 0p are very similar, even 
quantitatively, to 0*. 

But for P 7^ Q, the mutational input from the saturation regime to just below the 
shoulder is not quite balanced by the combined effects of the growth in the rest frame 
and the motion at speed v which acts to decrease 0. This imbalance results in P(t) 
slowly changing. For \P — Q\ <C Q — as it certainly will be unless (N is either huge or 
tiny — the dynamics of P after has converged to some 0p is simple: 
dP 

— « e(Q - P) (58) 

with e ~ v/Q ~ 1/t sw . The stability of the fixed point, 0*, backwards in time simply 
reflects the instability of the nose and is thus controlled by the same eigenvalue e. 

We could have directly analyzed the convergence of to 0* by linearizing around 0*. 
The dynamics of the deviation from the fixed point, A0 = 0—0*, is ^ {C\ — 20*)A0. 
The eigenvalues of C\ — 20* are all negative, with the least negative being — e. The 
corresponding eigenfunction, ip(x), must be 
<90p, 



dP 



\P=Q- 



(59) 



The other eigenvalues, which also control the approach to the whole family of 0p's, are 
(asymptotically) much larger than e, of order s for moderate speeds, and of order (vs) 3 
for high speeds. 

At the special value of (, (, the initial stage of the dynamics will take P(t sw ) to Q 
and then, much more quickly, take 0(x, t sw ) to 0*. This special behavior corresponds 
to vanishing amplitude of the slow transient (i.e. of the eigenfunction, ip). For larger (, 
P(t) will overshoot Q and then approach it from below, while for ( < (, P will approach 
Q from above as shown in figure 2. For a wide range of (, because of its logarithmic 
dependence on ( from Eq.(57), P(r sw ) will be close to Q, deviating from it as 

i>(W*0+ 1 ^. (60) 
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Since t sw ^> i, P{r sw ) deviates from Q by only a small fraction of s unless (/( is very 
large or very small. 

4-0.1. Distribution of N(T): We are now in a position to understand the statistics of 
the fluctuations in population size at fixed speed. A crucial simplification arises from 
the simple form of the slow transient to the fixed point 0*. After a time r sw , will 
be mostly determined by P(r sw ) independent of the earlier history, with P(r) then 
decaying exponentially from P{r sw ) to Q. The <j>p(x) that obtain after t sw are close 
to x f° r X > P-i have a narrow shoulder region of width v/P ps v/Q, and below the 
shoulder drop exponentially as ~ Pexp[7(x — P)] with steepening log-slope at lower 
X- This exponential behavior that is only weakly dependent of P, means that for any 
p(x) with support that does not extend beyond P, 

J d X Mx)p(x) ~ e^~ p ) j d X <j>*(x)p(x) (61) 

as long as the latter is dominated by the regime in which 0* has log-slope close to 7. As 
both p{x) and 0* are log-concave downwards, this will indeed be the case. Furthermore 
we expect that the actual p(x) will not usually extend beyond Q — the typical lead of 
the nose. Thus at least for P > Q, which will turn out to be the important regime, the 
approximation of Eq.(61) should be good. 

For the generating function for N(T) with T > r sw , the dynamics of P yield 

P(t = T)-Q^ [P(t sw ) - Q] exp[-e(T - t sw )} (62) 
« — log(C/C) exp[-e(T - r sw )\ (63) 
so that the needed quantity at time zero is 

dxHx, 0)p(x, 0) ~ / d X <l>*(x)p(x, 0) U (64) 



with the exponent 

a(t) w e- £t : (65) 

since e m 1/t sw , a(r sw ) 1/e. 

We have seen that the distribution of N(t + t sw ) depends on 

u L (t)= I ' d X <t>*{x)p{x,t) (66) 



- in particular (exp(— z/^(t))) - - from a time r sw earlier. But what about the 
distribution of i>l itself? Since this is — in the branching process — the sum over all the 
individuals at one time of the probability that each fixes, we expect that it will typically 
be of order unity. Indeed, since in the branching process the fate of each individual is 
independent, the quantity that arises in the generating function calculations 

(e-$ rp ) = 1 - V[&t least one fixes] (67) 

which should be of order unity. 
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The quantity vi,(t) plays a special role: its distribution at one time is essentially 
determined by its distribution at earlier times. This can be seen from its generating 
function, (exp(— 9i7l)), for which we are interested in 9 = 0(1) (in contrast to 
( = 0(1/ N) in the generating function for N). We thus need to analyze the backwards 
time dynamics starting from = 6(f)*. For x > Q, <fr w iH rapidly approach Xi while 
moving to higher x & t speed v. This will quickly produce a shoulder at some position, 
P g , which for 9 < 1 is P e Q + ^ log(l/0). Between (Q and P e the log-slope of will 

become close to 7 ^ ^. Thus after a time of order \og{l/9)/Q <C r est we will have 
^ 0p e . From then on, the dynamics is as above with exponential convergence of P{j) 
to Q. The result 

(e-^W) » ( e - ea(T) ^(*-)) , (68) 

with a{r) from Eq.(65), then follows. At fixed v^it — r), this is a the Laplace transform 
of a (one-sided) Levy distribution with parameter a{r). There is thus a long tail of the 
distribution of vl of the form 

ndv L ] ~ ^ (69) 
with the typical value 

I/L(*)~M*-T)] 1/a(T) (70) 

from the scaling with 0. This is a consequence of the exponential growth of fluctuations 
of the lead which corresponds to double exponential growth of ul- 

Note, however, that the simple result for the generating function that led to the 
Levy distribution is not valid for 9 substantially greater than unity as the approach to 
(j)p is then controlled by the non-linearities and Pe is no longer a simple function of 9. 
This means that the distribution of anomalously small vl is not given correctly by the 
above analysis. But this does not matter much: the probabilityis in any case very small 
in this regime as can be seen from the Levy distribution. 

The quantity ^l(^) is the effective number of individuals in the lead population 
at time t with the factor of 1/Q the average number of individuals with fitness greater 
than Q that is needed for one to establish and eventually be likely to fix. In the staircase 
model, the number in the lead population starts small and grows exponentially before 
the next higher fitness population establishes and advances the lead. It has been shown 
previously [10, 21] that when normalized by its typical value, the number in each lead 
population has a Levy distribution with exponent a — 1 — - with the scale given by 
the number in the previous lead population. When iterated through a series of steps 
this gives essentially the results above. The advantage of weighting with 0* to define 
u L , is that the effective number in the nose is defined at all times — essentially by their 
collective potential to takeover the population. Some such weighting is of course always 
needed for a continuous distribution /i(s) for which there is no obvious — and good - 
definition of the lead population. 

The instability of the front that is responsible for the exponentially broadening 
Levy distributions will be cutoff, as analyzed in Sec. (5), by changes in the speed of the 
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mean to keep the total population size fixed. As this will have occurred no longer ago 
than a time t sw before the lead population of interest is established, and the exponent 
oc{t sw ) = 1/e, we expect that at fixed N, vl will have a multiplicative width of order 
unity with the needed quantities, in particular (e~ 9uL ^j, that determines the distributions 
at later times being relatively well-behaved. But the resulting distributions will not, as 
we shall see, be Levy distributions. 

We now return to the distribution of N(T) at fixed speed. Given u L a time r > t sw 
earlier, we have that 

(e-^)^(exp[-z/ L (T-r)(C/C) Q(T ^ ) ]) ■ (71) 

At fixed vl(T — r), this is a Levy distribution with parameter a which, as for vl itself, 
yields a long tail of the distribution of N(T): for N ^> N m 

ndN] ~ ^ . (72) 

Note that this distribution has infinite mean for any a < 1 (as obtains here). Thus, as 
could have been anticipated from the analysis of the dynamics of the average population 
at fixed v, the mean is a very poor characterization of the distribution of iV at times 
more than t sw after typical steady-state conditions. 

By time T ps 2t sw later than a time with typical vl, the effects of the fluctuations 
of the nose will have caused the distribution of N(T) to have a width of order its typical 
value. But as many properties depend on logarithmically on NQ, the important result 
is that the width of the distribution of \og(N/N) is still quite narrow even for several 
times t sw . The distribution of \og(N() for the Levy distribution can be computed from 
its characteristic function. Since the factor of v L just sets the scale, its effects can be 
factored out: for simplicity we set it to unity and obtain 



t . t ^ = = xgm (73) 

whence cumulants can be found by expanding the logarithm of this in powers of (3. One 
thus finds that, given ul — 1 at time zero, 

log(7V(T)C)) = [exp(T/r w ) - l] lE (74) 

var[log(iV(T)C)] = ^[exp(2T/r slu ) - 1] (75) 
with 7s Euler's constant. 



4-0.2. Typical N and determining the speed, v(N): At long times the instability of the 
nose causes it to either run away, resulting in very large and rapidly growing populations, 
or fail to keep up with the advance of the mean (at speed v), causing iV to shrink rapidly 
and the population to go extinct. Which of these fates is more likely depends on the lead 
of the nose or, essentially equivalently, the value of N a time t sw later: the population 
scale N is, up to a multiplicative factor of order unity, the population size above which 
the population is likely to diverge and below which it is likely to go extinct. Note that 
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for long enough times that a is relatively small, Z(() = (e ^ N j^V N < ^ 
general result for broad distributions of Sec. 3. 0.2. Thus log[l/£] is roughly the median 
of logiV (up to differences between 1/e and 1/2, the latter being the probability that 
N is below its median value). As we shall see, the fluctuations are sufficiently small on 
time scales of order r sw , that N(v) is within a multiplicative factor of order unity of the 
value of N in the fixed population-size model that would evolve at average speed v. 

In the modest speed regime, with v/s 2 small, there will be large oscillations of the 
population distribution due to the roughly periodic advance of the nose. This will be 
true even with a distribution of fitness increments: a large — 0(1) — fluctuation of the 
population in the nose will occur on average once per time r sw and the descendants of 
the particular early-establishing mutation that caused it will contribute a substantial 
fraction of the population for a time at least of order r sw with the fitness of these 
descendants increasing by close to evenly spaced establishments. These close-to-periodic 
oscillations are reflected in the roughly periodic (in P — x) structure of the log-slope of 
both 0* and the slow transients (ftp, superimposed on top of their average log-slope of 7 
near their shoulder. These yield oscillations in / 4> P p which determines the distribution 
of N(T). It is useful to consider v(t) having an oscillatory component — as it will at 
fixed iV — that suppresses these oscillations. As they are fast on the time scale of r sw on 
which the nose becomes unstable, the mean v will determine the longer time behavior of 
interest. Indeed, on the time scale of one step, s/v ~ r sw /q, the Levy exponent a only 
decreases from unity — corresponding to a distribution 5(N — N) — to 1 — -: thus the 
distribution of the non-oscillating component of N hardly broadens at all. In practice, 
as noted earlier, the rapid oscillations at fixed speed are not large for reasonable ranges 
of parameters as v/s 2 in the multiple mutations regime of interest is unlikely to be very 
small. 

As long as the approximation (discussed in Appendix D) that the effects of 
mutations on the dynamics of P and determination of Q are negligible except for the 
direct input from the saturated region to around P — s, is valid — which it will be for 
moderate speeds — the heuristic result that \og(NQ) ~ is good. 

In the high speed limit, however, the transient 0p will, like 0*, have an Airy regime 
just below the shoulder and this will contribute corrections to log TV. But these are 
small compared to the dominant term above. Thus in inverting N(v) to obtain v, they 
will only result in small corrections (down by 1/q times log-log factors). Although more 
detailed analysis is possible in this regime, we will not carry it out here. 

We have found that the condition that relates N to v is that N is, up to a numerical 
factor of order unity, the inverse of ((v) which is determined uniquely by the condition 
that the backwards-time dynamics at speed v starting from a constant 0, <f>(x, T) = £, 
converges to 0* faster than e~ tv<<T ~ t \ i.e. with zero amplitude of the slowest eigenvector. 
This is true very generally - - including with long-tailed fi(s) and with deleterious 
mutations — as long as the fluctuations are not so large that non-linear effects change v 
substantially: in the case analyzed here, they do not as we shall see in the next section. 
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Thus to very good accuracy, for the fixed population size model 

v(N) is determined by N(v) = . (76) 

CO') 

4-1. "Typical" p{x) and weighted fixation probability 

Thus far we have not explicitly studied the typical fitness distribution of the population. 
This will have structure on the scale of s reflecting the roughly periodic nature of 
the advance of the nose: we discuss this at the end of this subsection. But for 
understanding the longer time scale dynamics we are most interested in the envelope of 
fitness distribution which will be relatively smooth: call this p(x)- 

4-1.1. Envelope of fitness distribution: With the guess from the above analysis (and 
justified further below) that typical fluctuations in p(x) are only by multiplicative factors 
of order unity and that these are driven almost entirely by the fluctuations of the nose, 
we could analyze the deterministic approximation with a cutoff at the nose representing 
the effects of the absence, typically, of individuals with fitness more than Q above the 
mean. This is close to the original approach of Tsimring et al [4] . The result would be 

P « P*q (77) 
the formal fixed point of the deterministic dynamics truncated at Q. This gives as good 
an approximation for the typical p as is possible given the looseness of the definition of 
"typical". 

The distribution p*Q can be analyzed similarly to the linear part of 0* either using 
Laplace transforms, as in the Appendix, or approximately in terms of the log-slope 

£(X) - =2g& . (78) 

This is defined with a minus sign to reflect the difference in the sign of the vJ^ term 
between C) and C which causes p* to decrease with x while 0* increases. With the 
log-slope slowly varying on a scale of s, we have 

X~<-M)(0- (79) 
This is exactly the same equation as for 7 = d\og((f)*) / dx, but we are interested in the 
other branch of the solution: that with £ increasing from zero at x = to 7 at x which 
is close to Q. Over all but the last part of this range of Xi t ne mutational term is small: 
this yields a p which is close to a simple gaussian (as assumed by Good et al [12] and 
others). 

Near to x (where the log-slope approximation breaks down) and beyond that to Q, 
the behavior will change, similarly to 0*. In the high speed limit, p in this regime will 
be an exponential e"^ x times the same Airy function with Q just below the negative of 
its first zero. For moderate speeds, there will be some additional structure from Q — s 
to Q before cutting off p at Q. If 

HQ) ~ i , (so) 
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representing a just established population of size 1/Q spread out over the scale of the 
typical fitness increments s, the normalization will be roughly correct with 



J P(x)dx 



N (81) 



With fixed population size rather than fixed v, this should give the correct relation 
between N and the mean speed v, up to a multiplicative factor inside log(NQ) of order 
unity (or perhaps logarithmically larger). 

4-1-2. Affecting the future: We can now find the relative contributions of different 
parts of the fitness distribution to the future. The probability of the individual whose 
descendants will take over the population having fitness x above the mean is roughly 
given by 

PtakeoveriX) ~ (82) 

which, because both logp and log0* curve downwards below Q, is dominated by a small 
range near Q. This dominant range has width of order s for moderate speeds — in 
the staircase model simply the lead population — and of order the width of the Airy 
regime, (us) 3, for high speeds. 

The reason that at high speeds an individual that is a of multiple s behind the 
nose can takeover the population, is that mutations from lower fitness continue to 
contribute substantially to the growth of the fittest deterministic subpopulation even 
while this is producing enough mutants to establish the next subpopulation. But the 
same is true for the next-to-lead population being fed by the one below it and so 
on back of order (t>/s 2 )3 steps. Note, however, that this does not imply that these 
populations are stochastic: they are not; it is only that their deterministic dynamics 
involves deterministic mutational input — including to the lead population — as well 
as deterministic growth. 



4-1.3. Fine structure of fitness distribution: In the staircase model, the fitness 
distribution is a series of equally separated delta-function peaks, as shown in figure 1. 
One might expect that this would also be the case for short-tailed /i(s) albeit with each 
peak now a clump of subpopulations with slightly different fitnesses and the separations 
between these clumps varying somewhat around s. But the actual behavior is more 
complicated as illustrated in figure 3. 

Consider a single lead population with a specific fitness and no others close enough 
to contribute much to the future evolution. The important mutations from this will 
have a range of s around s that is narrower than s: with width of order s/^/I 
assuming; d 2 K/ds 2 ~ A(s)/s 2 ). But in this range there will be a considerable number of 
independently established populations with somewhat different fitnesses and typically 
none of them will dominate. A time r es t later, these will have collectively established 
a clump of new populations whose spread in fitness will be larger by a factor of \/2, 
as the fitness increments are independent of the previous ones. Similarly, after h new 
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Figure 3. Schematic of the fine structure of the fitness distribution for short-tailed 
mutation distributions, fi(s), in the huge population, high speed, regime. The sizes, n, 
of the subpopulations are indicated on a linear scale as a function of their fitness, x. 
Only the central portion of the distribution is shown. Typically, the fitness distribution 
is a dense set of closely spaced subpopulations (blue) with an envelope that is close 
to gaussian with mean x and standard deviation ^/v. Occasionally — on average once 
per time t sw — a lucky mutant is established anomalously early and its population 
(thick black bar) rises well above the others with similar fitness. The leftmost red 
arrow indicates such a lucky mutation (which occurred some time ago at the nose 
of the fitness distribution). The descendants of this lucky mutant are shown by the 
black bars, with the mutations that gave rise to them and advanced the nose indicated 
by the other red arrows. The first set of these mutations has a narrow range of 
fitness increments around s. This leads to a clump of subpopulations that stands out 
from the background of typical ones that arose at similar times. After a few further 
sets of mutations, the clumps will have broadened, as shown, and the descendent 
subpopulations from the lucky mutant become a dense set with the roughly periodic 
structure no longer apparent. 

The bulk of the distribution is advancing at speed v, while all the lucky mutant 
populations are growing (vertical arrows) at rates equal to their fitness above the mean. 
Soon after the time shown, the original lucky mutant subpopulation will become half 
the total population and the mean will jump rapidly to its fitness (at a speed a few 
times v). The mean will then stop increasing until the descendent populations (black 
clumps) take over. At that time the mean will start increasing again at speed v and 
the envelope of the distribution will become gaussian — until the next lucky mutation. 
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sets of establishments, the width of the clump will be Vh times as large. After h ~ £ 
steps, one would expect the fitness distribution near the nose to be spread out by 
as much as the typical spacing (s) between the clumps and after that no periodic 
structure would remain. But this is not correct: on average once in each q steps, one 
of the new establishments will occur early enough to contribute a large fraction of the 
total population of the newly established clump. Although the others will collectively 
contribute to the future, the lucky subpopulation will generate a roughly periodic set 
of clumps with the total population in each clump well above that in the background 
of the many small subpopulations from the less-lucky ones. These clumps will again 
broaden out over of order £ steps until the process repeats. 

For modest-sized populations, £ > q (this is equivalent to the condition for being 
in the modest-population regime). Thus anomalously lucky mutations will occur often 
enough that the roughly periodic structure will persist until the next lucky one occurs: 
the fitness distribution will look similar to figure 1 but with each peak replaced by a 
clump of peaks with smaller peaks between the clumps. But for huge populations, £ <^ q 
so that the periodic clumps will have broadened out and disappeared into the roughly 
uniform background before the next lucky mutation. Thus over most of the width of the 
fitness distribution, there will not be pronounced clumps. But on average twice in the 
2Q wide distribution, there will be a sharp peak — a delta-function with a single fitness 
- followed by a decaying and broadening series of clumps. This behavior is illustrated 
schematically in figure 3. 

In the discussion above, we were sloppy about what constitutes a clump: what 
matters is the size of sub-populations relative to the envelope of the fitness distribution 
(for the same reason as the exponential weighting of higher values of x reflected in the 
definition of vl). Near the mean of the distribution this distinction does not matter 
much, but near the nose it does. In particular, if one considers mutations from a single 
peak near the nose, far more will have small fitness increments than will have s ~ s. 
Although the populations these establish will not grow as fast as those with larger s, 
they will nevertheless constiute for some time a higher population of new mutants near 
to the initial peak than further ahead. Thus to observe the decaying clump structure one 
needs to normalize each subpopulation by the envelope of the typical fitness distribution. 

5. Feedback and Coupled Dynamics of Nose and Mean 

Armed with a general analysis of the stochastic effects of the fluctuations of the nose on 
the distribution of the population size at fixed speed, we can now analyze the dynamical 
adjustments of the speed required to keep iV fixed and the statistics of the fluctuations 
that arise from the coupled dynamics of the nose and the speed of the bulk of the 
population. These give rise to an effective model on time scales of order t sw which can 
be analyzed in detail. But the fluctuations also require adjustments of the speed on 
intermediate time scales which leads to larger than expected jumps in the mean fitness. 
Although the typical fluctuations of the fitness distribution are modest, very rare large 
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fluctuations dominate the average of the fitness distribution: these are analyzed in later 
subsections. 



5.1. Dynamics of feedback 

On time scales of order t sw on which the important feedback dynamics occurs, we can 
approximate the advance of the lead, Q(t), by a differential equation with a deterministic 
part and a stochastic part that together will cause the unstable stochastic dynamics at 
fixed v. But we must also include the effects of the mean advancing faster or slower than 
average in response to earlier fluctuations: these will decrease or increase Q, respectively. 
It is easiest to work in the rest frame with F(t) the position of the front and T(t) the 
position of the mean (more precisely the value needed to keep dN/dt zero). We then 
have that 

dF 

~ v + e [ F -T -Q}+ri(t) (83) 

for F — T close to Q. The eigenvalue, e, of the instability is close to 1/t sw , and converges 
to it in the limit of high v. The stochastic part of the front speed, r](t), has mean zero 
and consists, roughly, of a small negative constant part plus positive spikes with close- 
to-exponentially distributed magnitudes that give rise to the Levy distribution of the 
population size at later times. The effective noise, rj(t), has substantial correlations only 
over times of order single mutational steps which is smaller by a factor of q than r sw : 
this separation of time scales gives rise to simplifications which we have already made 
use of. The variance of the noise is related to the variance of log N that it would causes 
a time r sw later if v were fixed: we will use this to compute the statistics of the noise, 
most simply its variance. 

A sub-population that was in the lead at fitness F at time t , having just been 
established by reaching a size of order l/Q, will, at time T, have grown to a size 



n(T; t) « i exp 



F(t)(T-t)- f T dt'T(t') 



(84) 



(Note that we have here lumped together all the subpopulations in each peak of the 
fitness distribution discussed in the precious section.) 

The total population at time T is roughly the maximum over t of these sub- 
populations: 

N&maxn(T;t) : (85) 

this maximum is achieved at some time tx- The derivative of log N(T) with respect to 
T must be zero. This and the expressions for n(T, t) and N together imply that 

dF 

T(T) = T(f r ) + (T - t T )^(t T ) = F(tr) (86) 

the equality of the first and last expression is simply that the sub-population at any 
fixed fitness reaches its maximum when the mean catches up to it. 
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The effects of the front advancing non-uniformly will, on average, be felt a time r sw 
later. Thus we consider small fluctuations of the time delay T — t T around this, defining 

St (T) =T-t T -r sw , (87) 

and expand to linear order in 5t. We can then Fourier transform the dynamics in time, 
defining 5t(u) and i](u), and obtain the variable part of the speed, u(t) = dT/dt — v 
with transform u{uj): 

u(u)= V (u)— (-^ sw ? (g8) 

{-iLUT sw - eT sw ){e-' lu)T ^ - 1) - iUT sw er sw 

the e~ WTsw factor arising from the time delay between the fluctuations of the nose and 
its effect on the speed. 

In the absence of the feedback from the adjustment of the speed a time t sw after a 
fluctuation of the nose, response functions to 7] would have had a simple pole l/(— ioo — e) 
reflecting the exponential instability of the nose that would then have existed. This is 
stabilized by the feedback: the only singularities in the frequency dependent response 
are a series of poles in the lower-half u plane; nevertheless, oscillations, albeit strongly 
damped ones, will occur in response to a fluctuation of the nose. The fluctuations in 
the fitness of the nose, F(u), are 

g — lUITsw I 

5F(u) = r^H— — — ; . (89) 

(-iu)T sw - er sw )(e tu > T °™ - 1) - iUT sw er sw 

5.1.1. Autocorrelations of the lead: The fluctuations in the lead, Q = F — T, are 

p — IU>T SW 1 _|_ • ._ 

sow = t swV (oj)— — ^ Tr • — • ( 9 °) 

{-iut sw - er sw )(e lu)T ^ - 1) - iut sw €T sw 
With the feedback, the variance of the fluctuations in Q are finite and at low frequencies, 
8Q(u) ~ t sw t]{u). Surprisingly, for e = 1/t sw , the absolute value of the linear kernel 
relating 5Q to t sw i] is unity at all frequencies of order r sw . This means that the covariance 
of 5Q, (5Q(t + r)SQ(t)), is proportional to the covariance of the stochastic part of the 
speed of the nose — hardly to have been expected! - and has correlations only on 
the short time scales of order r sw /q of individual mutational steps. One would have 
expected that a larger than average Q at one time, t, would tend, via the higher speed 
of the nose that it implies, to cause a larger than average Q for times up to at least 
t sw later since the feedback that limits further fluctuations will not occur until then. 
But the rate of change of Q(t) is also affected by changes in the speed of the mean and 
these are anti-correlated, due to the feedback, with SQ at earlier times. The chance 
that SF(t) is positive is positively correlated with 5Q. But changes in T(t) can be 
much sharper than changes in F(t) and thus dominate short-time changes in Q(t). For 
e = 1/t sw these effects, taken together, cancel on average. More generally, for er sw 
slightly different from unity, the residual correlations on time scales of order t sw will 
be present but small. Note, however, that these results only apply to the covariance: 
higher order correlations of 5Q(t) should show structure on time scales of order r sw ; 
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as the fluctuations in rj(t) are very asymmetric, third order correlations should exhibit 
such structure. General higher order correlations are discussed below. 

5.1.2. Nose fluctuations and fitness diffusion coefficient: At low frequencies the 
fluctuations in ^ and must be the same. These are proportional to t)(uj), the 
fluctuations of the speed of nose, but larger by a factor of -. — r — ~ 2 since e ~ — . 

1 1-^€T SW t sw 

This factor is due to the amplification of the short-time fluctuations over times of order 
r sw . The nose and the mean fitness will thus advance at average speed v plus a stochastic 
part that is diffusive at long times with diffusion coefficient, 

DkU—\ ) C v with C v = fdt(rj(t)rj(0)) . (91) 

V 2 ^ Tsw ) 

The covariance of r)(t) can be obtained by using the results we have derived for 
constant speed, i.e. in the absence of feedback. A fluctuation in F results, at constant 
speed, in a fluctuation of log N a time t sw later of magnitude 

5 log N(t) « ^-5F(t-T sw ) (92) 

v 

with the coefficient ^ = t sw . After a further time, r, has elapsed, the distribution 
of N/N is given by the Levy distribution with parameter a(r) ~ e~ T / Tsw . As derived 
above, this implies a variance in log N equal to ^(a~ 2 — 1) which is ~ t/t sw for r <C t sw . 
But we can also derive these fluctuations by integrating the effects of the fluctuations 
of f] exponentially amplified by the instability over a time r. This yields 

var[logiV(t + r) - logiV(t)] « [e 2 ^™ - l] C v (93) 

so that C v = vr 2 /3r s 3 u ,: this is equivalent to the result derived in DFG in terms of 
the fluctuations in the establishment times. Note that the correlations of rj are only 
over times of order s/v which is much shorter than t sw thus the integral over these 
correlations is all that matters (up to subtleties discussed below) for variances on time 
scales of order t sw . 

By equating the results from the nose fluctuations to the phenomenological ones in 
terms of rj, the diffusion coefficient of the mean of the population at fixed iV is found to 
be 

2tt 2 

D « — (94) 

°'sw 

a result that should become exact at asymptotically high speeds. Note that because of 
the amplification caused by instability of the nose, this is larger by a factor of four than 
would have been expected from the short-time fluctuations of the speed of the nose. 

This D implies that over a time of order t sw , the variations of the cumulative 
advance of the nose are typically only of order 1/t sw which is much smaller (by 
l/log(s/C/&)) than the typical fitness increment of s, and even smaller, by 1/ log(Ns), 
relative to the cumulative fitness increase over t sw which is Q. As the fitness of the lead 
population jumps by around s with the establishment of each new lead population, the 
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fitness of the nose must be denned so as to smooth out these jumps. At constant speed, 
we have seen that the appropriate weighting of the fitness distribution is by the fixation 
probability: vl = J <j>* P ~ e~ 5 ® Taw is the quantity that determines the speed a time r sw 
later. This suggests defining F at time t in terms of p(x, t) in the rest frame to be the 
F that makes 



with the weighting function <p F simply equal to 0* shifted so that the position of the 
shoulder is F rather than Q. With this definition, F will, indeed, have fluctuations 
around its steady increase that are jumps forward by of order l/r sw at an average rate 
of once per time r sw , and backwards jumps times r sw later caused by the compensatory 
changes in T(t) — as we shall show. The fluctuations of F are thus of the same order as 
the ambiguity in the position of the nose due to the width of the shoulder region: ~ = — 



(and the concomitant inaccuracies of the sharp shoulder approximation). Changing the 
definition of F by of order this ambiguity is equivalent to choosing in Eq.(95) another 
number rather than unity. 

While the position of the nose can be accurately defined, the nose population, vl, 
is no longer a natural quantity because it is unclear what speed, or even what position 
of the mean, to use in defining the appropriate 0* for the weighting of p. But having 
used vl to infer the statistics of fluctuations of the fitness, F, of the nose at constant 
speed and the implied adjustments of T needed, we can now use F instead. Note, 
however, that the fact that the fluctuations in Q(t) do not accumulate over time scales 
substantially larger than t sw , implies that, if the speed were held constant for times of 
order r sw , the fluctuations in the nose population would, as previously assumed, only 
be by factors of order unity (albeit with a long tail from the Levy distribution). Thus 
our approximations are self-consistent. 

Although the fluctuations of F are quite small, one has to, nevertheless, be careful 
as to which other quantities are considered: as we discuss below, T(t) and Q(t) typically 
have much larger fluctuations than F(t) on time scales intermediate between r est and 
r sw : these are indicated by the presence of delta function components of the associated 
response functions, such as in Eq.(90), a time t sw after a jump in F. 

5.1.3. General analysis of dynamics on feedback time scales: Thus far we have only 
used the variance of the fluctuations of the speed of the nose. But we can use the 
branching process analysis to do far better. We continue to focus on time scales of 
order r sw in the asymptotic high speed limit. The crucial simplifications are: that the 
statistics of the jumps in the position of the front are independent (except on time scales 
much shorter than t sw ), that the fluctuations in Q are small enough that the distribution 
of the sizes of the jumps is essentially independent of Q, and that the dynamics and 
effects of feedback changes in the speed are linear as analyzed above. Moreover, the 
Levy process dynamics of the nose gives us the statistics of the jumps. 

Consider a general quantity (such as the advance in the front over some time 




(95) 



Q 
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e~ zX 



interval) that is a linear function of the past dynamics of the front with some kernel 
Kx{t): X(t) ps dr Kx(T)i](t — r). Since the 77's are independent, the generating 
function of X can be immediately found from that of the 77's. In the continuum limit 
(valid for large q = r sw /T est ), r](t) is the derivative of a jump process: it consists of 
a small constant negative part (to make its mean zero) and a series of positive delta- 
functions that occur as a Poisson process at rate l/r sw with independently distributed 
magnitudes. The typical magnitude of the jumps is ~ ~ — so it is convenient to define 
their magnitudes in these units as — with the generating function of the distribution of 

T~sw 

the u>'s given by the formal limit of a — >■ 1 of the result obtained from the Levy process, 
Eq.(73): 

e— ) = lim [ r(1 + * (1 + 6)) - ll = zrKz) (96) 

/ 6^0 [ e r(i + z ) ^ K J K ' 

with ip{z) = ^ the digamma function. [There are subtleties associated with the 
asymptotically delta-function part for w within of order e of zero as indicated by the 
bad behavior in the large z limit, but these will disappear with the integral over time, 
below.] From Eq.(96) we obtain the generating function for the distribution of the X: 

exp {— r drK(r) [z^{\ + zK{t)) + z lE \ ) (97) 

I T sw JO J 

with the ^- the rate of jumps and 775 = — ^(1) making (77) = as it was defined 

to be. For any bounded Kx, the distribution of X falls off exponentially reflecting 

the exponential tail of the distribution of the w's — proportional to e~ w - - which 

corresponds to the power law tail of the Levy distribution of the nose-population. Note 

that for the dynamics of the nose population, a change in T corresponds to dividing 

v L by a factor that is exponential of the integrated effects of the change: the analysis 

in terms of the position of the nose (which is related to the logarithm of u L ) is exactly 

equivalent to combining this feedback effect with the Levy branching process. 

There is one important caveat to the analysis in terms of linear functions of 77: 

these are only meaningful if the kernel Kx does not contain any delta-function parts: 

when it does, as for T and Q, the linear relation between X and 77 must break down 

and additional features appear as we analyze in the next subsection. But for well 

behaved quantities, the full distribution can, at least in principle, be computed from the 

generating function. A natural quantity to consider is the amount the nose advances in 

time t sw , Y(t) = F(t) — F(t — t sw ). The associated kernel is well behaved at both high 

and low frequencies: Ky has no delta-functions and its integral is finite. The finiteness 

of the integral of Ky — i.e. its zero frequency limit — follows from the observation that 

once the effects of the shift in the sweep time, Eq.(87), are incorporated, Y is driven by 

the front fluctuations as 

dY Y(t) -Y(t-T sw ) . . n . . . n . 

= -A2 ^ ^1 + v (t) _ 2r)(t - t sw ) + r)(t - 2t sw ) (98) 

(XI Tsw 

with the discrete time differences acting like a second derivative of 77 at low frequencies. 
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5.2. Jumps in mean fitness and clustering of fixations 

A jump in the position of the front would, a time r sw later, cause a jump in the total 
population if T did not adjust to compensate. But since T(t) only controls ^ rather 
than N directly, to compensate for a jump at the nose T — v t must have a sharp spike. 
In the linearized analysis, this is seen in the high frequency behavior of T(u) which is 
proportional to i](u), in contrast to the front position which advances as the integral 
of rj(t). [As Q = F — T, Q(t) exhibits the same sharp spikes but with the opposite 
sign.] Of course, these spikes, which are approximately delta-functions on the slow t sw 
time-scale, must be rounded out on shorter time scales: the natural expectation is on 
the scale of the steps of advance of the front, r est , but this is not correct. 

A jump in F(t) can occur via an atypically early establishment, a larger than 
typical s, or a mutation from slightly closer to the nose than typical. As long 
as fx(s) falls-off faster than exponentially for large s, the effects of any of these is 
similar and all contribute to the distribution of jumps discussed above. An increase 
of Q from Q to Q + SQ, causes, a time t later, the resulting subpopulation to be 
n(t) ~ exp[(Q + SQ)t — \vt 2 \ which reaches the total population size, N ~ exp(^j-) 

a time 5t ~ ^ 2 ® SQ earlier than it would have starting from Q. To keep N constant, 



this necessitates a jump in the mean fitness of 5T = v5t ~ y2Q5Q — the square root 

appearing because of the quadratic maximum of n(t). With 5Q = w^, the amplitude 
w is of order unity and 



with w distributed as in Eq.(96) below. This distribution falls-off as e w for large w 
corresponding to a tail in the probability density of ST of the form 



Thus the typical scale of a jump in the mean fitness is of order y/v: far bigger than the 
typical jumps of F(t) which are of size J. 

After the mean fitness jumps, it will not advance further until the effects of the 
continuing advance of the nose catch up: this takes a time ST/v to occur. The change 
in T integrated over this time is simply w which is the factor by which the jump of 
the nose causes the nose-population to increase and the weight of the delta-function in 
the dynamics of T(t) on time scales of order t sw . One might worry that such strong 
non-linearities in the response to jumps of the front would invalidate the linearized 
results beyond this rounding of the delta-functions. But as jumps typically occur only 
once every time t sw , the spikes will rarely overlap and do act almost independently as 
approximate delta-functions without compounding non-linear effects. 

The jumps in the mean fitness are not really sharp because of the time it takes 
for a population fitter by of order y/v to take over from a less fit one: this time is of 
order \j\/v. Thus the jumps in the mean fitness will be smeared out on the same time 
scale as the duration of their effects except for relatively rare ones with w substantially 
larger than unity which will have more marked effects. The speed of the mean of the 





(99) 



V[d5T] ~ 5Te- (5r)2/2v d5T. 



(100) 
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population, will change during the jumps by an amount of order the average speed: 
for jumps with w substantially larger than unity, ^ will increase to of order y/wv and 
then decrease to close to zero. Thus even in the huge-population regime in which there 
are not substantial variations in the speed associated with the successive establishments, 
the speed will still vary by factors of order unity. 

Note that although the jumps are quite smeared out, they dominate fluctuations 
in the cumulative increase in T for times considerably longer than r sw — up to times of 
order (log Ns)t sw — implying that the result in the previous subsection for the diffusion 
coefficient of F(t) and hence T(t), only applies for the latter on very long time scales. 

In the modest-size population regime, the jumps in the mean fitness from 
anomalously early mutations advancing the lead are in addition to the roughly regular 
jumps of sizes close to s from the typical steps of the nose. The anomalous jumps 
correspond to the mean taking several steps at a time, advancing quickly by those 
several s. 

One might guess that the jumps in the mean fitness would be associated with 
fixations as each jump arose from an anomalously lucky mutation in the nose that by 
itself contributed a substantial fraction of the then-new lead population. But during 
the time in which this lead population grew to collectively become the largest sub- 
population, the fraction of it that descended from the lucky mutation did not change 
much. Thus for this mutation to takeover the whole population, its descendants must 
accumulate further beneficial mutations sufficiently rapidly. For the original lucky 
mutation to fix in the population — or for a less lucky mutation whose descendants 
are sufficiently lucky to do so — typically takes a few times r sw . This process was 
analyzed in reference [21] where it was shown that the time to fixation — equal to the 
coalescence time of the whole population — is t sw [log log N + 0(1)} (a seventh relevant 
time scale). But there is an interesting feature that arises from the same source as the 
jumps in the mean fitness: the mutations do not fix one after another but tend to fix in 
clusters of several at a time. Park et al [1] found this in simulations and observed that the 
distribution of the number, k, that fix together is close to an exponential distribution. 
Analysis along the lines of this paper and ref. [21] show that the tail of the distribution 
of k is indeed generally exponential and that in the limit of large q, the distribution is 
exactly exponential with mean simply q. Thus Q is not only the additional fitness that 
the fittest member of a mutant lineage has accumulated by the time the lineage has risen 
to a significant fraction of the population, it is also the average increase in the fitness 
associated with a fixation event. Again, we see several difference fitness increments - 
s, y/v, and Q — here associated with different properties of lucky mutants and their 
descendants. 

5.3. Average fitness distribution 

The fluctuations in the nose, although limited by feedback from adjustments of the 
speed to keep the population size fixed, can, nevertheless, occasionally be so large as 
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to make the mean of the fitness distribution, (p(x)), much larger than its typical form. 
The average fitness distribution is controlled by extremely rare large deviations of the 
position of the nose which are caused by very anomalously early establishments of a 
new lead population — far earlier than the typical jumps analyzed above which occur 
on average once per time t sw . 

To study the effect of anomalously large jumps of the nose, we can approximate 
the speed of the mean as roughly constant over the time between the jump and when 
it affects the mean. As analyzed above (and in DFG), the statistics of the jumps 
drive the distribution of the population size, vl, of the new nose, with a power law 
tail proportional to du L /u L 1+a and the exponent after one step advance of the nose of 
a — 1 — -. (This corresponds to the exponential tail of the distribution of magnitudes 
of the jumps in F(t) as discussed above). As the earliest possible establishment time 
is just after the previous lead population has established, it is earlier than typical by 
( T est) ~ £/Q and will result in a population at the new nose larger than typical by a 
factor of exp[£(l + ^)] (with £ = \og(s/U b )). This thus gives a cutoff for the distribution 



such rare events yield a contribution to (u L ) of order exp(y). In the limit of large 
L = log(iVs), Ijq is small so this is not a substantial increase in (vl)- But two effects 
have been left out of this simple estimate. First, the exponential instability of the 
nose causes subsequent establishments of new lead populations to be more likely to also 
occur earlier than typical. And second and more important, a series of anomalously early 
establishments, while even rarer than a single one, can give much larger contributions 
to (p(x)) near t° the typical nose as well as between the nose and the peak of the fitness 
distribution, and, indeed, further ahead than the typical nose. 

Consider an anomalously early establishment of h steps in succession. If each of 
these is as early as possible — occurring within a time around 1/Q of the previous 
one — the probability of this happening is (e~ e ) h since each there is a factor of Uf, for 
each mutation that occurs before any amplification of the lead population. As we are 
interested in the universal large q limit it is convenient to define such a jump in the lead 
in units of the typical lead, Q: A = = ~ — 1 = so that a jump in Q of h steps of 
size s corresponds to A = K Since q — this occurs with probability e~ 2LA ~ -^a: 
simply a continuation of the exponential distribution of w out to anomalously large 
values. As the initial growth rate of the lead population is simply Q, the anomalous 
lead population grows with rate 2LA/r sw . It is natural to rescale time and fitnesses, 
defining T = t/r sw and X = x/Q- Then the mutation rate, which appears only via £, 
drops out. 

From these scalings of the properties of rare jumps of the nose, we can guess the 
form of the averages of the fitness distribution that they result in. As the logarithm of 
the probability of a fluctuation by A and the logarithm of p(x) are both proportional to 
L, we can anticipate that the averages of p(x) resulting from such fluctuations will scale 



of v L . As 




(101) 
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as powers of e L ~ N (analogous to the power of e e that resulted from a single jump in 
Q of size s). We conjecture a scaling form: 



logKp( X )>]«LS(JJ (102) 

with H(X) a scaling function that is universal in the large L limit (and the v inserted 
to dedimensionalize p). Therefore we expect that (slightly sloppily due to neglect of 
corrections in the exponent) 

(P(X)) ~ N E (%) (103) 

Any class of rare process that yields a result of the above scaling form yields a 
lower bound on S(X). The simplest, of course, is the typical behavior which would 
yield S = 1 — X 2 corresponding to the simple gaussian shape of the fitness distribution. 
But we can do much better. 

A jump of of the nose by A j is followed by further growth of the lead as A ps Aje T 
due to the instability of the nose. But if Aj is not small, this will result some (scaled) 
time Ta 7 later in a large jump in T which will cause A to become negative for some 
period of time. We conjecture that the dominant contributions to the averages of p(X) 
occur at time T/^j after such a jump. For each X, we can then find the Aj that gives the 
largest contribution to S at that X. This yields a lower bound for the function S(X). 
For small X (for which small Aj dominates) 

~>l-X 2 + ^X 3 , (104) 
while at the average lead, the bound is 

2(1) >= 0.29 (105) 
which corresponds to a lower bound on the average nose population: 

(u L ) > N ' 29 . (106) 
Beyond the typical nose, we find 

E(X) > for X <X with X > 1.15 (107) 

corresponding to a large average population up to at least 15% above the average lead of 
the nose. Note that these results assume the fitness increments are all the same: if they 
are not, then other rare events involving larger than typical fitness increments could 
dominate. But with a short-tailed /i(s), simple considerations suggest that multiple 
small jumps will dominate over a single large one: this may well make the scaling 
function, 2(X), universal. 

The above results mean that the — at least implicit in many papers — efforts 
to compute the mean shape of the fitness distribution from close-to-deterministic 
approximations are doomed to failure. Of course, as the events that give rise to the 
large and skewed fitness distribution are very rare, what one should have been hoping 
for from the beginning was the typical rather than average behavior of the distribution. 
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Nevertheless, the rare fluctuations resulting in anomalously large averages should, in 
principle, start showing up as a lack of convergence of the inferred average as more and 
more separate evolutions (or separate times) are sampled. Note, however, that near the 
peak of the distribution, the effects will be small as the correction term to the gaussian 
is down by a factor of order 1/a/L at one standard deviation, \Jv, away from the peak. 

6. Conditioning on Behavior at Infinite Time 

The above analysis is rather cumbersome with several levels of Ansatz's and perturbative 
or asymptotic approximations — albeit ones that either have been or can be justified 
and done more systematically. It would be nice to have a cleaner approach in which 
properties of interest could be computed directly, at worst involving numerical analysis 
of deterministic dynamical equations such as the backwards in time evolution of <j). 

As fixing the population size exactly is both difficult and not necessarily realistic, 
and most of the properties depend only logarithmically on N, one might guess that 
constraining it so that it does not fluctuate too wildly would give similar behavior to 
fixing it exactly. Hallatschek, [11] has shown that by cleverly modifying the model to 
directly control the nose in a particular way, together with fixing the speed, one finds a 
well-behaved steady state for which one can obtain closed equations for the first moment 
of the population distribution. 

Here we discuss an alternate approach that has some advantages, in particular, 
it does result in everything being computable via deterministic backwards in time 
dynamics. Its behavior turns out to be quite similar to Hallatschek's model. 

The basic idea is quite general: for stochastic population dynamics for which at long 
times the total population size, N, will almost always go to either zero or infinity, one can 
analyze the behavior conditioned on it doing neither: i.e. on N(T — > oo) ^ or oo. The 
hope is that this will keep the histories of populations that stay finite but non-zero at long 
times close enough to the "ridge" that separates explosion from extinction, that they 
fluctuate in some well-behaved way around a typical population size characterizing the 
ridge. In the context of asexual evolution, Simon and Derrida [24] have analyzed a model 
in which the speed is fixed and the condition that there be a single individual surviving 
at infinite times is imposed, although the selection is quite different as discussed in 
Sec.(8). 

We consider the branching process at fixed speed (as analyzed above) but condition 
explicitly on the behavior at infinity. To obtain means, distributions or correlations of 
"operators" at times in a range near t, say a quantity R(t), one computes 

(H(t)) - W)e-* N ™ ~ R{t)e-* N{T) ) nm 

\ K W)(t ,T,f) 1 ,fh) ~ JjPfrNfX) Z e -foN(T)} ^U8) 

where to is the initial time with t < t < T. Then (R(t)) c = 

— oo, T— >oo 

(R(t)) (t ,T,p u p2) is a wel1 behaved limit independent of t and T. [ A 
simple example in which this can be seen is a single clone with a mean growth rate 
that is positive.] The limit ends up also independent of the initial state and of the 
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parameters as long as both are positive. Thus we can make the pi almost equal and 
replace the differences between them by a derivative with respect to p. This corresponds 
to weighting the behavior at time T by a factor proportional to N(T)e~ l3N ^ which 
suppresses both large and small N(T). 

The key quantities are the fixed point of the branching process equation, £j0 — 2 = 
and the lowest eigenfunction, ip(x), of the linearized operator about this fixed point 
which has eigenvalue — e. 

To compute any quantities of interest for times t in some range far from the final 
and initial times, we must find the generating function running backwards from very 
long times T starting with = (3, then adding in, at the appropriate times, the needed 
('s or 9's for the quantities of interest, and then running back to much earlier times. 
Finally, we have to project onto the initial conditions, differentiate with respect to p 
and then normalize (by doing the same thing with all the ('s and 9's equal to zero). 

At a time t long before the final time, all the transients will have become very 
small except the slowest one, so that 0* + B(P)ijje~ e< -- T ~^ with the amplitude B 
depending on p. We now add in the desired "fields" from the generating function: 
— <fi = 9 + Cl<f) — 4> 2 (if have ( at time t, this is simply added to at that time). If 9 
is small, such as to calculate, e.g., the mean of some quantity, then stays close to 0*. 
But to get the conditioning on the behavior at T, we need to keep the dependence on P, 
thus the non-linear part is still crucial: the term of order B times 9 is what is needed. 
The non-linear dependence of B on p will in the end drop out with the normalization 
factor, thus all we need is the behavior linear in B: derivatives with respect to B can 
be used instead of with respect to p. We now run the dynamics back to the initial time, 
t , which, for t — t large, will involve only the eigenfunction ip so that 

« 0* + Ail)e- e{t - to) . (109) 

The coefficient A is dominated by the {6}, but we are interested in its derivative with 
respect to B: this yields the conditioned generating function 

ZAM> - (no) 

Note that the normalization factor is now taken into account automatically, since in 
the absence of 9, A — B. The probability that iV neither diverges nor goes to zero in 
time T — t is, up to 0(1) factors associated with behavior near the initial and final 
times, simply e~ £ ( T ~ to \ Thus one loses a factor of order one for each sweep-time. This 
is because the fluctuations in the unconditioned, unconstrained N(t) are by factors of 
order unity in time r sw , as we have shown. 

The basic mathematical structure of the effects of conditioning at infinity is similar 
to that of Simon and Derrida [24]. 

6.1. Exactly "solvable" suppressed- fluctuation model 



Before analyzing the behavior of the population dynamics conditioned on infinite times, 
we observe that there is an exactly equivalent model that is local in time - again similar 



Asexual Evolution Waves: Fluctuations and Universality 



43 



to reference [24]. The local model can be obtained by, in the sense of dynamics in 
terms of time translation operators, commuting the operator at the late time T used to 
condition, back through the operators of interest at intermediate times, to the initial 
time. The resulting stochastic model, which we call the suppressed-fluctuation model, 
is 



with the first two terms the same linear operator at constant speed and stochastic noise 
as in the original model, and the two new terms preventing divergence and vanishing of 
N, respectively. Both 0* and ip are very small except near the nose. Thus the effect of 
conditioning is equivalent to a particular way of preventing the nose from getting too 
far ahead or behind the mean speed by modifying the growth rate of populations in the 
nose, with one part depending inversely on a weighted total population in the nose and 
the other strongly suppressing mutant populations that are substantially in front of the 
average lead. 

It is far from obvious that this non-linear model is exactly soluble - - but the 
generating function for the distributions of any quantities of interest can be computed 
from the backwards time dynamics as above. 

6.2. Distribution of population size 

To compute the distribution of the population size we make use of the simplification 
derived earlier that, at fixed speed in the absence of conditioning, the population size 
is proportional to the population in the nose, ul, a time t sw earlier, as long as the 
fluctuations are not too large. 

The Levy distribution with time dependent exponent cx(T) can be understood 
simply in the staircase model. As discussed above and in references [10, 21], the feeding 
of the yet-to-be established population with fitness s larger than the lead population 
which is growing exponentially, results in an essentially Levy distribution of the new 
higher fitness population once it is established with parameter a = ai ~ 1 — - = 1 — ~. If 
we consider each member of this population as giving rise, independently, to a population 
in the next higher fitness class with this distribution, then the dynamics becomes just like 
that of a population of individuals that have a Levy distribution of offspring. This can 
be iterated straightforwardly with the generating function at each successive step found 
by replacing the variable z in (e~ z " L ) by z ai . In the limit of 1 — Oi\ <C 1, this iteration 
results in a(T) = e~ T l T3W . The general analysis carried out above for a distribution of 
fitness effects yielded the same result. This implies that to a good approximation we can 
replace the full backwards in time dynamics of the generating function on time scales 
of order r sw and larger, by just keeping track of a single variable, z(T — r) with simple 
dynamics: 





dz 
dr 



z/t ( 



8W 



(112) 
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and adding in to z a ( at time t to get the distribution of ^l(£) and thence that of 
N(t + r sw ). 

We start with z — j3 at time T, iterate backwards to time t, add ( to z(t) and 
iterate back to time to- We then have 

z(0) = [( + ^''Y^ (113) 

and the distribution of Vhif) can be computed by taking the derivative of 
exp[i/ L (t )z(t )] with respect to (3, normalizing by the same with ( = 0, and taking 
the limit T — t — > oo and t — > — oo. The initial conditions then drop out — as they 
should in steady state — and we get simply, after rescaling by iV = l/( to get the 
distribution of iV rather than u L , 

c + c 



e' CN ) ~ (114) 



so that 



V[dN ] ~ ^-N/N (n5) 



and (AT) w TV. 

A more detailed analysis that takes into account corrections to the linearized 
dynamics of -P(r), etc., yields probability density proportional to N 5 e~ N ^ N with very 
small 5 ps l/[21og(A^(Q)]. The primary effect of this correction (which is only a factor 
of 1/e for A^ ~ Vn) is to make (^), which for neutral drift is often considered as 
the inverse of an "effective population size", finite: indeed only larger than 1/N by a 
logarithmic factor. [Note that this correction for small population sizes is associated with 
the failure of the approximation of Eq.(61): the very small A" behavior is determined by 
the C ^ C limit of the generating function which gives rise to backwards time dynamics 
that involves P(r) substantially smaller than Q so that the cutoff of <pp at P changes 
/ p<f)p substantially from the approximate form used earlier.] 

These results for the distribution of A" are strictly valid only in the huge population 
regime in which v/s 2 is not small: if it is small, then there will be almost periodic 
oscillations in N(t), with frequency vs and ratio of maximum to minimum of order 
exp(s 2 /8f ), superimposed on the longer time scale fluctuations. 



6.2.1. Suppression of nose fluctuations: The effects of the — 2<j>* p term in the dynamics 
suppress the rare large fluctuations of the advance of the nose that give rise to the Levy 
distribution of the nose population, v^. In the fixed Af model, these fluctuations cause a 
sub-population with a particular mutation that happens to establish anomalously early 
to have a substantial chance of taking over much of the population a time r sw later. As 
analyzed in reference [21], these relatively rare events control the coalescent properties. 
In the conditioned model, on the other hand, these rare events are exactly what are 
suppressed: thus we expect the dynamics of fixation and coalescent properties to be 
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somewhat different in this model. This can be seen by computing the distribution of 
vi,(t) given ^l(O) which has generating function 



e-™>\v L (0)) *> ^ (1 + C) i-a(t) — ( 116 ) 

derivable from Eq.(113). This yields a finite conditional mean of v L {i) that decays 
exponentially at rate e from vl(Q) to the mean vl which is unity. As will be shown 
elsewhere, this changes the coalescent properties from the universal ones analyzed in 
reference [21]. 

6.3. Moments and correlations of p(x,t) 

For the pure branching process model at constant v, integer moments of the distribution 
are, as shown for the mean in Sec. (5.3), very badly behaved, being dominated by 
extremely rare anomalously large events. But in the conditioned model, moments and 
correlations — such as (^l(^)^l(O)) found from Eq.(116) — are well-behaved and quite 
characteristic of the typical behavior. Beyond the simple approximation, moments and 
correlations can be derived order by order in a perturbative expansion around the fixed 
point, 0*. The eigenfunctions of the linear operator C\ = L\ — 2(f)* for deviations from 
0*, and those of its adjoint C VjC = C v — 20*, control the behavior. The corresponding 
discrete sets of eigenfunctions we denote ipj and Vj, respectively, with the orthonormality 
condition 

J dx^jVk = Sjk , (117) 
corresponding completeness relation 

T,^(x)Ux') = s(x-x') , (118) 

3 

and eigenvalues —6j. The smallest eigenvalue, 6q, and corresponding ipo, are what we 
have previously denoted simply e and ip. Because the non-linearity in the backwards- 
time dynamics for is simply 2 , the non-linear matrix elements that enter are 

Hij,i = J i)ii)jV k . (119) 

The mean of the population density as a function of fitness, is 

<p( X )>=25>,( X )^° (120) 

3 t 3 

which is dominated by the lowest eigenfunction of C ViC . Indeed, one can show that u is 
a weighted average of p: 

I p(x) \ 



Mx) , (i2i; 



the weighting being just that that appears in the dynamical model Eq.(lll), which is 
equivalent to conditioning at infinite time. One can also compute fluctuations of p and 
show that when v/s 2 is not small, the relative fluctuations of p, coarse-grained over a 
scale of order s, are only modest. 
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6.4- Relationship to Hallatschek model 

Hallatschek [11] has introduced a model that, while not obviously so, turns out to be 
quite closely related to our suppressed-fluctuation model. He considers constraining 
other linear functionals of the fitness distribution rather than the total population size 
and shows that a particular choice leads to closed equations for the average steady state 
(p(x)) a t fixed speed. The constraint is (in our notation) that 



i.e. that the quantity related to the effective number of individuals in the lead 
population, ul, is fixed to be equal to two. The dynamic equations that (p(x)) 
satisfies are qualitatively similar to our suppressed-fluctuation model in that the nose 
is effectively prevented from either running too far ahead or falling too far behind. 
Although the details are different, we conjecture that — in a sense that needs defining 
more carefully — the two constrained models are in the same universality class for 
large population sizes. But it is a different class than fixed-population-size models: 
most fluctuation- related properties are different. In particular, the coalescent properties 
are quite different than with fixed population size. [21] We leave an analysis of 
the similarities and differences between Hallatschek's model [11] and our suppressed- 
fluctuation model for future work. 

7. Field Theoretic Approaches 

Once one is using dynamic generating functions heavily, it is natural to ask whether 
one can enforce the "mean-field" constraint on the total population size in a functional 
integral formulation and then use well-established field-theoretic methods to analyze 
the behavior, including performing perturbative expansions around mean-field in some 
of the small parameters. Unfortunately, although it is straightforward to formulate the 
problem, analyzing it is far from easy and at this point it is not clear how much is 
gained. 

One can enforce the fixed population size constraint by making T depend explicitly 
on all the and {p} with the constraint enforced by an integral over an auxiliary field, 
K(t), that gives a delta function in N at each time. One can then compute the dynamic 
generating function in terms of a conjugate field, <f)(x,t): we use the same notation for 
future convenience although it is a somewhat different quantity than in the analyses 
thus far. After shifting <j)(x,t) by K(t), integrating over the noise, and using dots for 
time derivatives, we have a generating function of the form 




(122) 




(123) 



with "action' 




(124) 
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C\ = 4 - t— . (127) 



which is like that we could have obtained naively by putting in the fixed N constraint 
and integrating over T: here we have been careful in order to show that no Jacobian 
factors occur. 

We can now integrate out p to get a backwards time equation for of the usual 
branching process form but with an extra piece from the constraint that involves 

«(*) = K (125) 

together with the adjoint selection and mutation operator: 

-0=[4-T(t)]0-0 2 -/c(t). (126) 

Then we can go into the moving frame in terms of x = x ~ with 

d_ 
dx 

We have thereby reduced the full set of fluctuations of the {t]bd{x^)} to fluctuations 
in just two quantities: Y(t) and «(£). In principle, one could analyze this stochastic 
equation — perhaps perturbatively around T = v = constant and small k — and try to 
compute quantities of interest. This does not appear to be straightforward for similar 
reasons to the difficulties described below. 

Alternately, we could attempt to do a mean-field analysis by looking for a saddle 
point of the functional integrals by extremizing A. Setting the variation with respect 
to p to zero is equivalent to integrating p out since A is linear in p: this gives us the 
equation for 0. Expecting the saddle point solution to be uniform in time, i.e. constant 
speed, T = v and k constant, this has a well-behaved fixed point, (j>% v (x), f° r an Y 
k < k with k a velocity-dependent upper limit above which there is no well-behaved 
fixed point. Note, however, that <f)* KV does not have the interpretation of a fixation 
probability: indeed, for k > 0, it is negative for x l ess than some value just below Q. 

Formally, one can also extremize over K to yield / dxp = 0, as expected, and over 
T to give / dx4>* K (x) p(x) = 0- Extremizing over yields 

p=[C v - 2<f>( X )]p (128) 

which, with = 0*, is similar to the operator that appeared with conditioning on 
infinite times. But we know that for k = the eigenvalues of this linear operator 
will all be negative and thus there is no fixed point except p = corresponding to 
extinction. This will also be true for all but special values of k at which the operator, or 
equivalently its adjoint L\ — 20* acting on 0, has a zero eigenvalue. There is one value 
of k, k > 0, at which this occurs: not surprisingly, this is the same value of k beyond 
which the solution 0* breaks down. Thus the condition that a non-trivial saddle point 
exists yeilds k = k(y). The condition for a zero eigenvalue can be shown to force the 
apparently extra condition / 0*p = and also yields / XP = as one would expect 
in a naive "mean-field" approximation where T would be exactly the time-dependent 
average fitness. 

The problem, however, is that we are missing a condition. As the saddle point 
conditions are invariant under an overall rescaling of p, nothing fixes the total N. If we 
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plug back into the action, there is a boundary term K{T) J p(T) — K(0) J p(0) in the 
action up to time T. With K = nt this is Nk(v)T. We could now try to minimize over 
v but, not surprisingly, this would push v to infinity (for which k — > 0), the result for 
an infinite population. Better is to keep k slightly smaller than k(v) and then consider 
the linearized fluctuations around the (almost) saddle point. Integrating over these - 
which is rather complicated and we have not carried out in detail — gives a term in the 
action which is not too singular for k near k, has only logarithmic dependence on N, 
and is larger for larger v and k closer to k because the lowest eigenvalue decreases in 
these limits. Then minimizing the resulting approximation to the action — the sum of 
the boundary term and the contribution from fluctuations (both proportional to T) - 
over v will yield k of order 1/Nt sw with r sw only logarithmically dependent on N . 

For k v < k, the eigenvalue of C v — 20* will be slightly negative and it would 
appear that the population is doomed to extinction. However the fluctuations are very 
asymmetric — recall the statistics of the fluctuations in speed of the front, rj(t), that give 
rise to the Levy distribution — and the non-;linear effects of these (which are included 
in the linearized dynamics of around 0*) will prevent extinction. 

In principle, the quadratic fluctuations around the (almost) saddle point should 
yield the fluctuations in speed — i.e. diffusion constant — and how these are driven by 
the somewhat earlier fluctuations in the front: the value of \ & t which the non-linear 
term 2 becomes important and beyond which 0* ps x- It is n °t clear, however, that 
this procedure will yield a systematic asymptotic expansion rather than a (slightly) 
uncontrolled approximation. An important feature of the dynamics may well be 
problematic: even in the limit of arbitrarily large population sizes, the fluctuations 
in the population size at fixed v are of order the typical N. Although this only results 
in small fluctuations in v on time scales of order t sw , larger fluctuations occur on short 
time scales — 0(l/y/v) as shown in Sec. (5.2) - - and can thus not be treated by a 
low frequency expansion that one might hope would be valid. Whether the non-linear 
aspects of these fluctuations will cause fundamental problems with perturbative field 
theoretic approaches is not clear. 

As discussed in section Sec. (8), perturbative field theoretic methods may be much 
more useful for other models for which the time scales of the fluctuations of the nose 
and of the consequent adjustments of T are very different. 

7.1. Soft constraints on the population size 

An alternate approach is to allow the population to fluctuate but limit these fluctuations 
by an explicit coupling of the dynamics of T(t) to the population size N(t), its time 
derivative — related to the difference between the instantaneous mean fitness and T - 
or perhaps some convolution over past times of N(t — t). If the dynamical equations for 
dT/dt are linear in N (or more generally in p(x, t)) p can be integrated out and one is left 
with using the behavior of the Y(t) dependent backwards equation for to determine 
the dynamics of T. This is analogous to conventional dynamical mean field theories, 
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except that the fluctuations are far from gaussian. Unfortunately, it appears to be hard 
to find a dynamical model (even one for d 2 T/dt 2 which is non-linear in dT/dt) that is 
stable at all frequencies without knowing details about the nature of the fluctuations. 
The difficulties are associated with the fluctuations on time scales of order 1 / caused 
by fluctuations in timing of advances of the nose, as well as those on longer time-scales 
of order t sw (including the somewhat oscillatory nature of the response of T to the nose 
fluctuations as reflected in the complex poles in Eq.(88)), plus the need for T to move 
rapidly backwards when the population is shrinking to stop it going extinct. 

To prevent the total population size from fluctuating by factors more than of order 
unity, adjustments of T(t) are needed on time scales of order 1/y/v as discussed in Sec. 
(5.2). If one is willing to tolerate ignoring exponentially rare fluctuations that cause the 
population to go extinct suddenly, then a model of the form 



with iVo of order a typical population size and vq the corresponding average speed, is 
stable for ranges of a and (3 of order unity. Writing this in a functional integral form 
as in the previous subsection and integrating out p(x, t) results in dynamics of of the 
same form as Eq.(126) with k(t) = — (59(t) in terms of the field, 9(t), that multiplies 
in the action. This model can probably be analyzed in substantial detail, although 
how to systematically derive the slow-time-scale model of the coupling between the nose 
and mean — not to mention corrections to this — using the backwards-time dynamics 
of is far from clear. 

8. Summary and Extensions 

Understanding the dynamics of the simplest asexual evolution in large populations has 
not been straightforward due to several factors, primarily large fluctuations in some 
of the crucial quantities and a multiplicity of time scales. Here we have developed 
various methods for analyzing the dynamics including the statistics of fluctuations. The 
fundamental simplification is the separation between the stochastic dynamics of the 
nose of the fitness distribution — treated by branching process methods — and the 
non-linear dynamics of the bulk of the distribution that keeps the population size fixed. 
One of the issues we have focussed on is typical versus average behavior and ways of 
computing the former from generating functions. The analysis of the branching process 
aspects we have carried out with substantial generality and corrected some errors in 
previous analyses, [12] which do not, however, change most of the basic results. 

For the quantities that have been the focus of much of the previous work, the average 
speed of evolution as a function of population size (as well as the fitness distribution), the 
primary new result is the condition for determining N in terms of v from the backwards- 
time dynamics: N is the reciprocal of the value of (, (, that from <f>(x) — C = const. 
converges most rapidly to the fixed point 0* under the backwards time dynamics. This 
condition should replace various results in the literature — heuristic, somewhat ad-hoc, 




(129) 
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and/or cases in which p(x) is approximately gaussian. It is applicable far more generally 
that the regimes analyzed in detail thus far, including when deleterious mutations play 
a substantial role. For the model with only beneficial mutations analyzed here, we have 
illustrated a quick method for obtaining the lead, Q(v), of the nose and hence other 
properties. And we have shown that for many purposes, the fluctuations of the fitness 
of the nose are small enough (or rare enough) to justify using typical quantities. 

Beyond methodology, our focus has been primarily on the dynamics on time scales 
of order the nose sweep time, t sw , the time for the lead population that contains the 
important new mutations to collectively dominate the population and thus the time 
scale on which the stochastic and non-linear dynamics are coupled. This is also the 
time scale on which coalescence of genealogies occurs. [21] A simple stochastic model 
that couples the dynamics of the nose and the mean of the population is used to obtain 
the properties on this time scale, and its parameters derived from the underlying model. 

Note added: very recently, Neher and Shraiman, [23], have developed and analyzed 
an effective slow-time-scale model that couples the nose and mean of a population 
in which deleterious mutations cause the mean fitness to stop increasing or actually 
decrease: Muller's ratchet. In this situation, the dynamics and the methods for analyzing 
them are somewhat different but the basic effects of a time delay between the nose 
fluctuations and their consequent effects on the speed are qualitatively similar. 

8.1. Universality for huge populations with short-tailed mutation distributions 

In huge populations with short-tailed mutational distributions — /i(s) that falls off 
faster than exponentially for large fitness increments — there is much universality. The 
condition for the universal behavior is that log(iVs) ^> log 2 (s/Ub), with s and C7& the 
selective advantage and mutation rate, respectively, for the beneficial mutations that 
drive the evolution: these are determined mostly by fi(s). 

Many of the universal features occur on the sweep time scale t sw . Distributions 
of the fluctuations of certain quantities, in particular advances of the nose, are 
asymptotically universal and can be computed. A crucial simplification is that for 
many properties only the sweep time scale needs to be considered as the shorter time 
scales can be 'integrated out" with asymptotic analysis of the branching process that 
controls the dynamics of the nose on time scales less than r sw . Persistent variations in 
the mean of the fitness distribution, T(£), over time scales of t sw are very small being 
only of order 1/t sw — the smallest of the important fitness scales. These give rise to 
diffusion of the fitness (in addition to its steady rate of increase at speed v) with a 
diffusion coefficient which is a universal multiple of 

In the universal huge population regime, the average speed is large enough, v ^> s 2 , 
and concomitantly the standard deviation of the fitness distribution, a ~ y/v, is large 
enough that many subpopulations contribute substantially to the total population. The 
speed is then smooth on the time scale, 1/s, of the steps by which the nose advances. 
But there are transient fluctuations in T(t) on intermediate time scales of order l/\/v, 
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the time in which the mean fitness increases by a. In the asymptotic huge population 
limit, this time scale is much longer than 1/s but much shorter than t sw . Occasional 
variations in the timing of the new beneficial mutations that advance the nose cause, 
a time r sw later, transient increases in the mean fitness by of order which last for 
times of order 1 / y/v, corresponding to fluctuations in the speed by of order the average 
speed. The distribution of the amplitudes of these transient increases has a gaussian tail. 
These (somewhat-smoothed) jumps in the mean fitness have important consequences 
for experiments. They should be readily apparent when two or more otherwise-identical 
populations with different colored fluorescent markers are mixed together and evolved, 
as in various experiments. [25, 26] Observations of increases in fitness of one of the sub- 
populations relative to the other have a natural interpretation in terms of new beneficial 
mutations with magnitude inferred from the rate of increase of the fraction of the total 
population with that color. But one must be very cautious in interpreting such changes: 
jumps in the difference in fitness between the populations can be — indeed, for large 
populations are much more likely to be — the amplified consequences of much smaller 
differences in the times of occurrences of new mutations that advance the nose of the 
fitness distribution of one of the colors. 

The fate of most driving mutations is that eventually they go extinct because they 
are out-competed by the few that are lucky enough for their lineages to accumulate 
multiple subsequent mutations atypically early. [21] For those few that do eventually 
fix, they have typically accumulated several mutations early enough to result in jumps in 
the mean fitness. Although they fix only several sweep times later, the consequences of 
the fluctuations that cause these jumps is that mutations will fix in clusters: on average 
q = Q/s driving mutations at a time, with an exponential distribution of the size of the 
clusters. [1] 

The statistics of the fitness distribution are also universal for huge populations 
with short-tailed mutation distributions. But the fluctuations in the advance of the 
nose cause a power law tail out to large sub-population sizes which is only partially 
cutoff by the non-linear feedback that keeps the total population size constant. Indeed, 
the average fitness distribution is dominated by very rare fluctuations which make the 
average far larger than the typical envelope of the fitness distribution: we conjecture 
that the average is also universal. 

Much of the universality comes simply from the nature of the approach to the 
backwards-time fixed point, 0*, associated with the branching process: one slow 
direction governs the eventual approach with the controlling eigenvalue much smaller 
than all the others. More generally, the backwards time dynamics converges rapidly to 
a one parameter family of functions, <f)p{x), that obtain along this slow approach. And 
the essential features of this family are (asymptotically) independent of the mutation 
distribution and details of the short time dynamics of the nose. 

On shorter time scales than either r sw or I /y/v, there is substantial universality 
associated with details of the stochastic dynamics of the nose: new mutations that 
advance the nose and contribute substantially to the future — the driving mutations 
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- have fitness increments that vary little. As studied earlier, [10, 12] with short- 
tailed mutation distributions there is a narrow range of fitness increments around a 
"predominant" value s, that drive the dynamics. Although there are small variations 
in the timing and fitness increment of these driving mutations, only a small fraction of 
them — on average one per time r sw — are anomalous enough to result in the jumps 
in the mean fitness discussed above. 

In the huge population size limit, there is an additional fitness scale, (vs 2 )3 ^> s, 
that emerges: this is the range of fitness of almost all the individuals that are close 
enough to the nose to have a chance of taking over the population. Mutations from 
subpopulations in this range are important for the dynamics of the nose. 

Asymptotically, there are thus many important fitness scales (as shown in figure 1): 
from smallest to largest, these are l/r sw , the typical fluctuations in the fitness increase 
of the nose over a sweep time; s, the typical selective advantage of the mutations that 
drive the dynamics; (v s 2 )s, the range of fitness behind the nose from which an individual 
whose descendants takeover the population are likely to come; y/v, the standard 
deviation of the fitness distribution and the characteristic magnitude of transient jumps 
in the mean fitness that compensate for the fluctuations in the nose; and Q, the lead of 
the nose over the peak of the fitness distribution and the average total fitness increment 
associated with clusters of mutations that fix at the same time. Of course, as the ratios 
between these scales involve powers of logarithms, they will, in practice, never be far 
apart. Nevertheless, processes that occur on each of these time scales are important 
for understanding various aspects of the dynamics — even for this simplest model of 
evolution in large populations. The reduction of the complexities of the dynamics to the 
simple effective model coupling the nose and the mean, only strictly works for t sw much 
longer than the other time scales. But many of the qualitative — and semiquantitative 

- properties will be correctly captured by the effective model and the jumps in the 
mean fitness that it produces on intermediate time scales. 

We have not discussed coalescent properties here nor statistics of individual 
mutations (beyond the clustering of their fixations) or of genomes: these have been 
analyzed recently in this and related contexts, [21, 19] and our branching process results 
can be used to show their universality. But an important observation concerning the 
statistical properties of alleles is in order. As the results for the models considered here 
rely heavily on the Levy distribution of the fluctuations of the lead population and how 
these fluctuations grow with time, one might think that the cutoff of these fluctuations 
by the constant population size constraints would alter the coalescent and related 
properties. But if one is concerned with the relative frequencies of different mutant 
lineages, the rescalings of the effective lead population, vl, that result from adjustments 
of the mean, cancel out. Thus we expect that the statistics of the frequencies should 
still, up to corrections that are small for large q, be given correctly by the branching 
process analysis and the resulting Levy distributions. Thus, for example, the variations 
in the relative sizes of differently colored sub-populations (as discussed above) should 
reflect the statistics of the iterated Levy distributions. 
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8.2. Beyond the universal limit 

8.2.1. Modest-sized populations with short-tailed mutation distributions: Microbial 
laboratory populations can easily have iV <C i7& » 1, but are unlikely to have log(iVs) 
larger than log 2 (s/£4). [Note that as the magnitude, s, of the mutations that drive 
the evolution and the mutation rate, [/&, for these must be determined self-consistently, 
Ub can be much smaller than the total beneficial mutation rate making it even harder 
to get to the huge population regime.] In this "modest-sized" population regime, the 
average speed, v, is smaller than s 2 . This implies that the population will almost always 
be dominated by a peak in the fitness distribution that is narrower than s. And the 
mean of the fitness distribution will usually advance by steps of size roughly s following 
the close-to-step-wise advance of the nose. Nevertheless, anomalously early (or large) 
steps in the advance of the nose will sometimes give rise to jumps of the mean fitness 
of a few times s, similar to the 0{y/v) sized jumps in the huge population regime. But 
these jumps do not reflect similar sized steps of the nose: the mutations in the nose that 
cause these jumps are still quite close to s. 

When v/s 2 is small enough that the mean fitness is not smooth even without the 
anomalously large jumps, the detailed analysis we have carried is not strictly correct. 
But, as shown by DFG, the qualitative behavior is essentially the same and much of the 
quantitative aspects are only changed slightly, including the condition that determines 
the predominant s, s. The one notable exception is the range of fitness of subpopulations 
that have a reasonable chance of taking over the population in the future: with v < s 2 , 
some member of the population within s of the nose will almost always win (and the 
scale \v s] 1 / 3 — which is here less than s — no longer plays a role). Other than this, the 
behavior is — up to corrections that are likely to have smaller consequences than the 
approximations that gave rise to the model — very well captured by the analysis of the 
universal huge-population regime. 



8.2.2. Long-tailed mutation distributions: As discussed by several authors, [10, 17, 
11, 12] mutation distributions, fJ>(s), that decay more slowly than exponentially behave 
differently because individuals far from the nose can give rise to large effect mutations 
that contribute to advancing the nose: indeed, these, and the clonal interference 
competition between them, [5, 17] can dominate the dynamics. Although the methods 
used here can again be used to analyze the long-tail cases, there is substantially less 
universality: this is reflected in q (the number of mutations by which the nose leads 
to the mean) never becoming large even in asymptotically large populations. What 
happens instead is that the typical size, s, of the mutations that drive the evolution 
grows with the population size as a power of log N. [10, 14, 1] This causes the speed to 
grow faster than linearly in log N for large populations. Concomitantly, large oscillations 
in the speed of the mean fitness on time scales of order r est persist for arbitrary large 
populations since oc i og } NS \ ■ Nevertheless, the width of the distribution of selective 
strengths that drive the evolution can still be narrow compared to s — although this 
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needs analyzing in more detail to understand whether or not each new large-fitness- 
increment establishment also involves smaller effect mutations. 

If fx(s) falls off more slowly than exp(— Cs 13 " 2 ), with some borderline value of 
less than unity, the behavior for asymptotically large populations is different. The 
multiple mutation regime does not obtain: instead — at least in a loose sense — q 
sticks at two. DFG concluded that f3 2 — §, while more detailed considerations suggest 
that (3 2 is somewhat larger. Nevertheless, for more-modest population sizes, q depends 
on the mutation rate as well and can still be considerably larger than two with long- 
tailed ji as seen in simulations. [1] We thus conjecture that q will be non-monotonic in 
iV for sufficiently long-tailed fi. In the intermediate regime when q is considerably 
larger than two, aspects of the dynamics for modest population sizes are expected 
to be similar to modest-sized populations with short-tailed fi. It may be, however, 
that effects of fluctuations in the speed will be considerably larger in the long-tailed 
case because of the lack of a narrow predominant range of s and this might make the 
fixation probability approximation much worse than for short-tailed \i with similar q. 
Simulations at extremely large N and further analysis of the fluctuations would be 
needed to fully understand the long-tailed case. 

An oft-studied case is a simple exponential distribution of beneficial mutations. 
[27, 28, 8, 12] This is marginal in that q increases without bound as the population size 
increases — albeit very slowly. [10] Its marginality can also be seen from how the clonal 
interference approximation breaks down. [1] For an exponential mutational distribution, 
the branching-process fixed point equation becomes simply a second order differential 
equation. [12] The required absence of one of its two independent solutions in the linear 
regime sets the condition for Q. Again, this can be analyzed efficiently via Eq.(C5) as 
in Appendix C. 

8.2.3. Applicability of model: In general, as discussed in the Introduction, the condition 
under which the model studied here is applicable is that the statistical properties of the 
fitness landscape do not change much as the population evolves. This will certainly 
hold if the distribution of available mutations, fi(s), is the same for any genome in the 
region of the fitness landscape that can be accessed. But the discreteness of the set 
of beneficial mutations available means that the approximation that fi(s) is continuous 
may not be good. If there are a large number of available mutations in the relatively 
narrow predominant range around s that dominates the dynamics of the nose — the 
driving mutations — then the continuous approximation is valid. Furthermore, the 
exact values of the available s's in this range can vary as mutations are acquired. In 
this case, it is also reasonable to assume that the predominant-range mutations are not 
be depleted rapidly — as long as they occur in many genes whose effects are not too 
closely linked. 

In fact, the needed conditions are somewhat less restrictive: the behavior of the 
model will not change much even if the number of mutations in the predominant 
range does vary substantially. This is because the newly established population that 
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advances the nose by s, consists of a mixture of many sub-populations with different 
new mutations. If some of these new mutants have fewer available further mutations, 
their lineages are less likely to survive. But as long as the new mutants on average have 
similar number of predominant-range further mutations available, and there are enough 
mutant lineages in the nose that the fluctuations around this average are not too large, 
then the behavior will be essentially the same. The exact condition on the statistics of 
the "evolvability" of the new mutants — the full collection of possible multiple-mutation 
uphill directions from each — is not clear: it is an interesting avenue for future research. 

But their is an intrinsic problem with long-tailed distributions: for large 
populations, the predominant range is very far out in the tail of fj,(s) and continues to 
move further out as N increases so that the mutation rate of the predominant mutations, 
Ub, continues to decrease — in the clonal-interference approximation, C/j, ~ j^. [17] This 
means that three of the conditions for validity of the assumptions of the model are 
likely to break down: first, that there are a large number of mutations sufficiently far 
out in the tail, second, that such large effect mutations will not be depleted, and third, 
that even if enough new mutations are made available by the previous mutations to 
stop this depletion, their spectrum will be sufficiently independent of which mutations 
occurred first. Thus, while large effect mutations with a collectively low target size may 
in many circumstances drive the initial stages of the evolution, it seems unlikely that the 
further evolution will be well described by an even-close-to-steady long-tailed mutation 
distribution. A possible exception is discussed in the next subsection. 

8.2.4- Very high beneficial mutation rate: All our analysis has been carried out in 
the small mutation rate limit, Ub <C s: indeed, l/\og(sUb) has been used as a small 
parameter. Rouzine and collaborators [6, 16] have studied the opposite limit, Ub ^> s, 
possibly relevant for some viruses. From their results, one can anticipate that at high 
speeds the lead is sufficiently large that the time scale for growth of the lead population 
becomes fast compared to other time scales. Correspondingly, one can show (most 
simply by the methods used in the present paper) that the shoulder in 0* becomes 
sharp in this limit. The condition that determines Q in terms of v, Eq.(52), can then be 
used to analyze this case. Although there are several regimes, in the limit of very large 
populations the dynamics of the nose becomes similar to that in the small mutation 
rate limit that we and many others have studied. The universality thus extends to this 
regime although some of the time scales will change. [The intermediate population size 
regimes can be analyzed similarly although the behavior is somewhat different than that 
of the modest-population-size limit with small mutation rate.] 

8.2.5. Suppression of nose fluctuations: Hallatschek has introduced a model in which 
the nose of the distribution is constrained instead of the overall population size and he 
was able to compute various properties. [11] Here we have introduced a related model 
for which the computation of the generating function for any linear functional of the 
fitness distribution is reduced to analyzing deterministic dynamical equations. In both 
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these models, the mean speed is held fixed and the population size adjusts accordingly. 
Although this is very similar to what has, in effect, usually been done to analyze the 
steady-state of the fixed-population-size model, the nature of the fluctuations in the 
fixed-speed constrained- nose models is quite different. These constrained-nose models 
are in a different universality class. In particular, the fluctuations are far smaller, 
with only exponential rather than power-law tails in the size of sub-populations. This 
difference will also modify the coalescent properties analyzed in reference [21]. 

8.3. Extensions 

Most work has focussed on the statistical steady state of evolving populations or how 
the speed changes on long time scales due to the depletion of the supply of beneficial 
mutations. [6, 29, 23] But the initial dynamics starting from a clonal population is also 
of interest, especially for understanding laboratory evolution experiments (e.g. [25, 26]). 
The backwards-time methods can readily be generalized to studying the initial transients 
- indeed until the mean fitness of the population starts to increase substantially, the 
branching process approximation is very good. But the time-dependent speed of the 
mean, v(t), will have to be worked out self-consistently: this should again be doable via 
deterministic, rather than stochastic, computations although integrating the backwards 
time dynamics of may require numerics for realistic (non-asymptotic) parameters. 
Such questions as to whether individual mutations or multiple mutations will first 
takeover a population — especially with a long-tailed /i(s) — and variations between 
different initially identical populations (or sub-populations) are of particular interest 
and readily analyzable. 

The models discussed thus far all assume linear increase in growth rate as a function 
of fitness, Xj above the mean. But in many artificial and natural contexts, this is a poor 
assumption, especially when the fitness differences become large. An alternative is 
to select, for example, the best half of the population to divide and to kill the rest. 
[20, 19, 24] While more general forms of the growth rate as a function of x can be 
studied, the dynamics of very large populations is mostly determined by the large x 
limit as this is where the nose will be. For constant growth advantage for large x ~ 
such as the "best-half model — there is an absolute maximum speed (unless fi(s) has 
an exponential or slower tail [19]). Corrections to this speed limit have been analyzed in 
several contexts, (especially by Derrida and collaborators, reference [19] and references 
therein) and some analyses of fluctuations carried out. To obtain many of the basic 
results, the methods of the present paper can be used directly: indeed the analysis is 
somewhat easier due to the nature of the linear operators involved. There is a substantial 
simplification that occurs in the constant-growth-advantage model: the time scales on 
which the nose is unstable is much longer than the sweep time, r sw . A similar separation 
of time scales occurs for any sub-linear (for large x) increase in the growth rate. This 
should enable methods that take advantage of the separation of time scales, such as 
those of Sec. (5), to be developed. 



Asexual Evolution Waves: Fluctuations and Universality 



57 



Deleterious mutations, which we have thus far neglected, can be important in 
populations with high mutations rates (such as mutator strains of microbes or in viruses) 
and in very well adapted populations for which there are few beneficial mutations 
available. The combined effects of deleterious and beneficial mutations can be analyzed 
by combinations of branching process and effective stochastic models as we have done 
here. For asymptotic analysis via matching of different regimes as done in Sec. (3), 
deleterious mutations do not add major complications — at least as long as the speed 
is still positive which has been proved rigorously for asymptotically large iV with any 
distribution of deleterious and beneficial mutations. [30] However the quick method for 
obtaining Q from the sharp shoulder approximation — and thereby the approximation 
to the fitness distribution, p*Q — no longer works because the deleterious mutations from 
X > Q to x < Q ru i n the analytic properties of the linear part of <fi* (although Wiener- 
Hopf methods could potentially be used to handle this). For modest N, deleterious 
mutations can cause the speed to change sign — Muller's ratchet — in which case 
additional features occur. [31, 6, 29, 23, 18] 

A potentially important effect, especially when there are few beneficial mutations 
available with substantial fitness advantages, are double (or multiple) mutations. 
Individually, these are highly improbable, but if there are enough of them available, 
they could collectively drive the evolution. Such processes would involve one mutation 
that is either deleterious or whose benefit is sufficiently small that it cannot establish 
a new population that advances the nose, but which enables a spectrum of previously- 
unavailable further mutations some of which have larger benefit than the predominant 
single mutations. As the number of such two-hit processes is potentially very large - 
and they are, arguably, less likely to have been "found" by earlier evolution in different 
conditions — they could perhaps yield a long tail to /x(s) that drives the evolution and 
includes enough different processes that the tail is sustained as the evolution progresses. 
The general issue of mutational processes that do not go steadily uphill — and the 
balance between individual rareness and large numbers of possibilities — is an interesting 
direction for future study. [32] 

In large sexual populations with low rates of recombination — important at least 
for microbes — the evolutionary dynamics is again controlled by the nose of the fitness 
distribution. In this case, problems associated with fluctuations in the nose invalidate 
the methods of analysis of reference [33] for low recombination rates. Thus one would 
like better means to study these important fluctuations. Some of the approaches in 
the present paper — for example analyzing the rare events that dominate averages - 
may be useful for such populations in which the diversity and anomalously high fitness 
individuals are generated by a mixture of approximately-asexual accumulation of new 
mutations occasionally brought together or broken apart by recombination. 
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Appendix A. Average fitness distribution at fixed speed 



The dynamics of the average fitness distribution at fixed speed, p(x,t), is given by the 
linear deterministic equation: 
dp 



dt 



which can be solved via Laplace transforms. Defining 
p(\,t) = J dx exp(\x)p(x,t) , 
- noting the sign convention for A — one obtains 



p(X,t) = p(A + t,0)exp 



1 r x+t 
vXt--vt 2 + / dX'Mo(X') 

£ J X 



with the total population N(t) = p(A = 0,t) and 



M (X) = Jdsp(s) 



(A.l) 



(A.2) 



(A.3) 



(A.4) 



It is convenient to also define derivatives and an integral of the transform of the mutation 
spectrum: 

dM 



A<i (A) = 
M 2 (X) = 



dX 
d 2 M 
dX 2 



J dsp(s)se Xs 



ds p(s)s e 



2 As 



M-i(X) = J Q M (X')dX' = J dsp{s) 



- 1 



-A 



(A.5) 
(A.6) 
(A.7) 



If the initial distribution has maximum fitness above the mean of Q, the long time 
behavior will be dominated by the factor in the exponential of Eq.(A.3) 

fA+t 

dX'Mo(X') =M- 1 (X + t) -M-!(\), (A.8) 



/; 



which grows rapidly for t + X > log[l//x(s))]/s with s the s that dominates this integral 
at the time at which the integral becomes of order unity. Beyond time log[l//x(s))]/s, 
the position of the peak of the average distribution, p(x), grows exponentially. But the 
way in which the dynamics of the linearized equation for the average p(x, t) breaks down 
provides a useful bound on the actual behavior with fluctuations. 

Starting from a roughly gaussian fitness distribution cutoff at Q, the lead Q must 
be sufficiently large that with the correct speed, v, the average population — as it 
is an overestimate of the typical population — does not drop below its initial value, 
which it could only do before the new mutations start to dominate. For times longer 
than Q/v, the original population near Q and mutations that arise from this dominate 
and d\og{N) /dt Q — vt + A^o(^) which must thus be positive. The minimum of 
this expression, which occurs when dAi /dt = v, must thus either be positive or must 
occur before time Q/v. The value of A at which the nose-dominated contribution to 
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dlog(N) jdt is minimum we have denoted 7, with .Mi (7) = v. The argument above 
implies that if Q/v < 7, then Q — v 7 + .Mo (7) must be positive, thus generally, 

Q > y L = * i v-Mo{l) (A.9) 
the bound quoted in the text. 

Appendix A.l. Formal steady state fitness "distribution" 

The average fitness distribution will run away for any non-negative initial conditions. 
But formally, there exists a fixed point, p*(x), of the average equation, although it is 
not a distribution as it becomes negative. As the fixed point satisfies C v p* = 0, we have 

with 

£(\)+ X X = X*-v\ 2 /2 + M- 1 (\) . (A.ll) 

The behavior of p*(x) c & n be analyzed in different regimes via the inverse Laplace 
transform (or other means) to show that it is positive and well-behaved for x l ess than 
some value and beyond that it oscillates around zero with decaying amplitude as x 
increases further. 



Appendix B. Well-behaved solution in linear regime of 0* 

The general solution to the homogeneous linear equation, C\(p = 0, can be found by 
Laplace transforms, as above. We have 

d 

-v\ - — +M (\) 

which can be integrated and inverse Laplace transformed to obtain the formal solution 

4>{x) = I ^ xX+£W • (B.2) 

The integral must be done along an appropriately chosen contour, C. There are saddle 
points at values X(x) that are solutions to 

v\-M (\) = x- (B.3) 

The saddle point approximation is closely-related to the log-slope WKB 
approximation discussed in the main text (a related approximation is used in reference 
[6].) It can be justified over most of the range of Xi an d how it breaks down near Q also 
seen, by rescaling x by Q, and A by t sw — although this is somewhat unnatural as it 
makes fitness scales and time scales no longer reciprocals. With the rescaling X = x/Q 
and A = \/t sw , the exponent S + %A is proportional to ^- ~ log(As) times a function 
of X and A except for he mutational term which has a coefficient that becomes of the 
same order only when A is within st sw of unity. This thus yields the value of x a t which 
mutations become important and in this regime, the rapid dependence on A changes 



0(A) = 



(B.l) 
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the large parameter that would justify the saddle point approximation from ^ to j^-: 
i.e. it is only valid for x near to Q in the high speed limit in which is large and even 
then, it eventually breaks down as discussed below. 

For large negative x> there is a saddle at A ~ x/ v f° r which the mutational term 
is negligible, hence a contour that goes over this saddle would yield ~ exp(x 2 /2v) 
for large negative x : this corresponds to the badly behaved, but generic solution as 
this saddle is the highest. But there is also a saddle with large positive real A at 
Mo(\) ~ —X- This corresponds to a solution that goes to zero rapidly as x decreases. 
We thus want to choose a contour that goes over this saddle but not over the negative A 
saddle (or its continuation for positive x)- There are also complex A saddles with 3?(A) 
large and positive: in general there could be contributions from the linearly independent 
solutions defined by contours that go over each one of these saddles, but the coefficients 
of these must be small enough that their oscillatory behavior does not cause <fi* (x) to go 
negative — which it cannot. The complex-saddle solutions go to zero faster than this 
real-saddle solution for large negative x> thus their contributions will be negligible in 
this limit. We thus have a single condition that is needed to give well-behaved behavior 
in the linear regime for Q — x large: that there is no contribution to the single badly 
behaved solution. This single condition is sufficient to determine Q. [Note that the 
behavior is qualitatively similar to defining Airy functions via a contour integral, which 
is essentially an inverse Laplace transform: to get the well-behaved solution, Ai, requires 
not going over a particular saddle point, otherwise the exponentially growing solution, 
Bi, is mixed in.] 

The asymptotic behavior of 0* for large negative x is 

-log(0*)«(-x)A( X ) (B.4) 
with A the saddle point, which is at, e.g., A Mog(— x/Ub) for the constant s model, 
and A ~ \\J^ log(— x/Ub) for a half-gaussian mutational spectrum of width o. Note 
that 0* is much smaller than the badly behaved upside-down gaussian solution. 

What happens to the linear solution as x increases? At a critical value of Xi 
X = v 7 — Aio(l), the saddle merges with the badly behaved saddle, as signaled by 
d\/dx = oo: this occurs at A = 7 and x — X with .Mi (7) = v — the same special 
value that appeared earlier. Beyond x, the saddle splits into two complex saddles and 
for slightly larger x, 4> would start oscillating. This suggests that the breakdown of the 
linear approximation must occur near x as guessed from the analysis of the deterministic 
approximation to p(x, t) discussed above. As the relevant linear operators are adjoints 
of each other and their Laplace transforms simply related, this is hardly surprising. 

The condition for the validity of the saddle-point or WKB approximations for the 
smooth part of log0, is that the curvature at the saddle point, A^i(A), times the square 
of the scale at which the non-quadratic terms come in is large. Near to 7 we can expand 
the exponent, £ + xA, in the contour integral: 

S + XA « (x - x)(A - 7) + ^M 2 (7)(A - 7? (B.5) 
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and infer that the scale on which the saddle-point approximation breaks down is when 

X - X ~ [^2(7)]* ~ (vs)* ■ (B.6) 

When t; » s 2 , this is a distance of more than s from x and higher order terms will have 
smaller effect. But for smaller v, the scale s is less than (vs)s and higher order terms 
will make the saddle point approximation break down for x ~ X °f order s. 



Appendix B.l. Moderate speeds, v <^ s 2 

To find the fixed point, 0*, we must match the linear solution(s) to the non-linear 
solution just below Q. For small v/s 2 , the behavior is relatively simple. For Q — x 
substantially less than s, the mutations primarily come from the non-linear region which 
gives Q jQ_ x n(s)ds in the fixed point equation. This is much smaller than the other 
contributions which are both of order Q 2 exp[— (Q — x)Q/ v ] until values of s near the 
peak of exp(sQ/v)/i(s) start to enter: this occurs for x near Q — § ior Q/v ~ 7. For 
larger x, i.e. in the range Q — s < x < Q — v/Q, 

0* » Qexp[-(Q - X )Q/v + (Q - xY/M . (B.7) 

We pull out, as in the main text, the rapidly varying factor, e^ x and use / = 
6 -7(<2-x)q which matches onto the shoulder solution near Q and obeys, in the regime 
of interest for x < (Q — §, the linearized fixed point equation 

- V ~ H ~ x)f ~J ds ^ s y' S f(x + s) (B.8) 

where the — v^+x<j> has become —v^ — (v ; y—x)f an d the convolution over /i(s)0(x+s) 
a convolution over fi(s) exp( ; ys)f(x + s) with / dsfj,(s) exp(7«) = A^o(7) ~ f- [ Note 
that in the main text we added and subtracted this times fix) to yield f(x + s) — f(x) 
in the convolution together with a term (x — x)f{x)] 

Near Q — s, the mutational input to / must play a significant role. If it did not, 
the balance of the other terms would make / start to increase as x decreases implying 
that the log-slope of 0* would continue to get more and more negative, making the 
mutational contributions more negligible and leading to the badly behaved exp(x 2 /2t> ) 
solution for 0*. But if the mutational term were too large, it would tend to drive / 
negative. Thus the mutational input at x ~ roughly v /2s, needs to be comparable to 
(-U7 — Q + s)f(Q — s) w (try — Q + s)Q exp[— s(Q/v — 7) + s 2 /2v]. With s 2 /v relatively 
large, this condition implies that the terms in the exponential roughly cancel yielding 
the result in the main text: 



Q ^t>7 + \s + 



X + \s + 



- \og{s 2 /v) 

S 



(B.9) 



-log(s /v) 
_ s 

since the A^o(7) contribution to x is —v/s which is smaller than the terms above in this 
moderate speed regime. The behavior of 0* can be constructed step by step with the 
dominant balance being between the (1*7 — x)f an d the mutational input terms. The 
steps will be of size close to s as long as 7 stays sufficiently close to 7 that s ~ s still 
dominates the mutational input, which it will for multiple steps below Q. On top of the 
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parabolic dips of log / between each integer multiple of s below Q that are caused by 
those in the first segment, the envelope can be seen to be roughly 



-log/ 



Q-x 



log \ e ~ ) +log(s 2 /^) 



which corresponds to a log-slope of of 



l{x) - 7 ~ ~ log 



s(x~X + s/2) 



In this regime, .Mo (7) ~ ex p[(7 — j)s]v/s thus the WKB approximation yields 



l{x) ~ 7 ~ ~ log 



s>7 - x) 



(B.IO) 



(B.ll) 



(B.12) 



which, with correction factors inside the log, is almost the same. 



7,2 



Appendix B.2. High speeds, v ^> s 

When v ^> s 2 , the saddle point approximation breaks down much further below Q than 
in the moderate speed case. But the behavior just below Q is much smoother as the 
exp[(Q — x) 2 /2f) factor in <f>*(x) between Q and Q — s is close to unity. These both 
suggest expanding in derivatives. Again, this is best to do in terms of the rescaled 
function, /. We have 



o = \ M ^)% + (x-x)f + o 



vs 



, 2 #7 

dx 3 



(B.13) 



with M.2(l) ~ vs. The scale of x~X that enters we denoted b = [M. 2 ('y) ~ (fs/2)». 
This suggests that the neglected terms will be smaller by a factor of (s 2 /t>)5. The 
dominant terms are simply Airy's equation, = yf in the variable y = and 
we are interested in the / oc Ai(y) solution that decays rapidly as y — > oo: it can be 
seen that this matches on to the WKB approximation in this regime. For negative y, 
the solution oscillates, first passing through zero at a value — ai = —2.34. Thus we 
expect Q < x + aib. Very near to Q the logarithmic derivative of / should be (from the 
previous subsection) Q/v — 7 = —Mq^/v + (Q — x)/ v ~ —1/5 + (Q — X)/ v which, 
with Q — x < a \b is close to —1/s. This is much larger in maginitude than the typical 
logarithmic derivative of / from the Airy solution which is of order (vs~)~s. But near to 
its zeroes, the derivative of logAi diverges, in particular as l/(y + a±) near to its first 
negative zero. Thus near to Q, the log-slope of (f>* is 

7-7^— . (B.14) 

X + a Y b-x 

In the interval (Q — s,Q), the mutational input is very different than at lower x an d 
the Airy approximation is not valid. Thus it is easiest to match the solutions near to 
Q — s at which the log-slope is still close to 1/s: this means that Q should be closer to 
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the Airy zero than 0(s) therefore 

<3»X + Oi(«s/2)3+o(s) (B.15) 
^7t;- ^ + ai {vs/2)t (B.16) 

^^jv - Mo{^) + a 1 [M 2 (^)f 3 (B.17) 

with the last expression in terms of the M.k{l) being somewhat more general. A more 
detailed calculation, outlined below, yields a correction term in Q of order — 1/7 which 
is indeed much less than s. 

Appendix C. Sharp shoulder approximation 

In the main text we introduced the well-justified approximation of treating the shoulder 
as a sharp boundary at Q with the linearized equation for \ < Q an d the effective 
boundary condition that 4>*{Q) = Q The approximate fixed point equation is 

^J* + Q I" fi(s)ds = (C.l) 
jQ-x 

with the second term the input from \ > Q- This is again solved by Laplace transforms 
with the input term and a boundary term —vQe~® x (from the motion of the shoulder 
at speed v in the rest frame) combining to yield an inhomogeneous equation of the form 

tj* + /(A) = (C.2) 

with 

"Mo(A) 



/(A) = Qe~ QX 



(C.3) 



A 

The — in O v that comes from the x term in C\ means that the solution is of the form 

0*(A) = e £{x) j X d\'e- £ Wl(\') . (C.4) 

with the exponent £(A) = —v\ 2 /2 + A^-i(A) as defined earlier. A multiple of 
the homogeneous solution, e £ ^ x \ is included in the general solution by virtue of the 
undetermined lower bound on the integral over A'. Unlike for the formal linear solution 
over (—00, 00), the Laplace transform of 0* on (—00, Q) should be well behaved. In the 
left half-plane 0* should be e~ x ® times an analytic function since the solution should 
vanish for x > Q- The difficulty is that the integral over a contour parallel to the 
imaginary axis that is required for the inverse Laplace transform cannot be deformed 
into the left half-plane to make <p*(x > Q) vanish because of the e~ vX2 l 2 term in e £lyX) 
which diverges badly at ±ioo. Thus for general Q, there is no well-behaved solution. 
This should not be surprising as we know that the general linear solution grows as e x2 ^ 2v 
for large negative x while the solution that is well-behaved for large negative x wm n °t 
obey the boundary condition at Q that makes it vanish for x > Q- The only way the 
solution can be well-behaved in both ways is if the integral over A' almost exactly cancels 
the divergence at A — > ±ioo. With the lower bound of the integral at A' = — ioo, the 
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cancellation occurs near this end but, in general, not near the +ioo end — unless the 
integral over the full range vanishes. Thus the condition for a well-behaved solution is 
that 



e 



-e(x') 



I(X')d\' = (C.5) 



which will occur only at discrete values of Q, only one of which — if there are more 
than one — will yield an everywhere-positive 0*. (Note that other values of Q, which 
have negative for some x < Q, can yield the eigenfunctions and eigenvalues of the 
linear operator that determines the convergence to (jf ) 

The above condition that determines Q is a solvability condition enforcing the 
condition that the projection onto the badly behaved solution vanishes. The function 
e -£(A) j n wh^h j s expressed, is the transform of the formal steady state solution, p*, 
of the linear dynamics of p which is the zero eigenvector of the adjoint operator, C v . 
Transforming back to \ space, we obtain the condition quoted in the main text: 

Jdxp* Q (xMx) = vp* Q (Q) (C.6) 

with Q(x) = jQ_ x dsp(s). 

For the high velocity limit, v ^> s 2 , Q is in the regime in which the Airy 
approximation is valid and the solvability condition becomes essentially 

Ai[(Q - x)/b] ~ ^Ai[(Q -x + §)/b] (C.7) 

with the left-hand side arising from the boundary term and the right from the mutational 
input term. This yields, as Ai goes to zero linearly, the result that Q is below the position 
of the first zero of the Airy function by I/7. But this is of order the width of the shoulder 
region so that the sharp wall approximation breaks down. Nevertheless, the scale of the 
correction should still be of order 1/7. It will turn out that the fluctuations in speed 
at fixed population size correspond to fluctuations in the nose by of order 1 / r sw ~ I/7 
over times of order r sw . Thus it is not possible to define Q to a precision of order I/7 
without taking into account these fluctuations. But even to do that well, one would 
need to redefine Q in a way that is useful in the presence of the much larger fluctuations 
in the position of the mean discussed in Sec. 5.2. [Note that the contributions from the 
complex saddle points with ^(A') ~ 2irk/s for non- zero- integer k, which the contour 
also goes over, are strongly suppressed for v ^> s 2 corresponding to the smallness of the 
deviation of exp[— (Q — x) 2 /2t> from unity in the interval (Q — s, Q).] 
In the moderate velocity limit, «€s 2 , the result found that 

Q ~ X + 1$ + V ~ [logics) + 1] ~ ^7 + \s + V -~ log(7§) (C8) 

can be derived simply by doing the integral in the solvability condition along the line 
3?(A') = 7. This expression for Q only differs by a small correction from the heuristic 
result and is close to that of Good et al [12] . Note that in this limit one can neglect the 
mutational term in S(X') in deriving Q: this also justifies the neglect of all the mutational 
effects except for the input from the saturation regime, which is equivalent to neglecting 
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the effects of mutations for all but the lead population as it is being established. If p*Q is 
replaced by its gaussian approximation, then the condition Eq.(52), weights the s that 
contribute by a factor /i(s) exp(^ — ^ which is the same weighting factor that appears 
in Good et al [12] . 



Appendix D. Dynamics of <p 

Here we derive some of the results summarized in the main text about the backwards- 
time dynamics of <p. With the definition r = T — t, |^ = C\(f) — 2 with "initial" 
condition </>(%, r = 0) = C(x) : C — const, being of particular interest. We again first use 
Laplace transforms in x f° r t ne linear part: 

^ + ^ = [- v \ + M (\)](j>(\, t) + nonlinear terms (D.l) 

so that functions of A — r will appear. With initial condition, (j>o(\) = 0(A,O), the 
linearized evolution yields 

4>(X,r) = MX-r)e E ^ (D.2) 

with 

VT 2 r x 



E = -v\t + — + / d 1 M Q (j) . (D.3) 

Z J A— r 

For constant initial condition, </>(%, 0) = (, so that only A = exists initially, this yields 
simply 

w Cexp[xr- 1 -vt 2 + M-^r)} . (DA) 

Formally, we can proceed to handle the non-linearity in the sharp boundary 
approximation used for the fixed point. The boundary part in the equation for must 
again be supplemented by the mutational input from the saturation region above -P(t') 
for all t' < t, and the effects of these integrated over r' from zero to r. The condition 
that r) = for \ > P{ T ) f° r an r can m principle be used to determine P(t) but 
the analysis is messy and we have not carried it out in detail. 

A heuristic approach again works well. We can do an approximate analysis in 
terms of the log-slope, 7(x,t), which obeys, if sufficiently slowly varying that the 
approximation <f>(x + s) ~ (f>(x) ex P[7(x) s )] is valid, 

g = l-|b-^!(7)] (D.5) 

which implies that 7 grows linearly in time while being advected at speed v — .Mi (7). 
This hints at the range of validity of the linearized expression for 0: it should be 
valid provided the path of advection from the initial condition to the point of interest 
does not pass through the boundary of the linear region which is the curve P(t). 
It also shows how the effects of the shoulder region just below P(t) propagate to 
smaller x- Take an advection curve that passes through a point (x, t) of interest 
which at an earlier time r' hit P{r')\ with 7' = 7(P(r / ),r / ), this occured when 
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X = P(t') + v(t — t') — .Mo (7' + t — t') + A^o(t')- The linear growth of 7 as it is 
being advected then yields 7(x, t) ~ 7' + r — r'. This will only be applicable when the 
log-slope at P has reached roughly 7 which is the value at which the advection speed 
become negative. 

For r less than 7, the linearized approximation ignoring the input from the 
saturation region will be good up to P(r) which can be found from the self-consistency 
condition 

P(r) « C exp[rP(r) - vt 2 /2 - M_i(r)] (D.6) 

indicating only modest corrections to the simple result ignoring mutations: Eq.(57). 
Beyond r ps 7, the behavior will be determined, as for the fixed point, by in a small 
region just below P(r) and the mutational input to it from just above P(r). The 
remaining effects of the original £ will quickly become negligible near the shoulder as, 
without mutational input from higher x, they would give a rapidly dropping leading 
to an increasing P(r) as seen in the solution ignoring mutations, P ~ ^ log(P/C) + vt/2 
which has a minimum near r ps ^2 \og(P/()/v of P min ps log(v^/C)- Note that if 
£ ps £ ps 1/iV, P m j„ is close to Q. 

In the modest speed limit with v of order s 2 or smaller, we can again make the 
approximation that the only mutational effects that matter for determining P(r) are 
those from the saturation region. Input at an earlier time r' from just above P(r') to 
X ~ P(t') — s and its effects move to higher x at speed v so that if r' = t — t s with t s 
such that 

P(r - *,) - s + u* s = P(r) , (D.7) 

the effects of this input will reach P(r) at r. The contribution to 0(P(r),r) from this 
input is, to logarithmic accuracy, roughly 



P(t) exp 



-A{s)+t s (P(r-t s )-s) + ^t 2 s 



(D.8) 



To get 0(P(r), r) pa P(r) which is required for self-consistency, the integral over s of the 
exponential factors must be approximately unity. Anticipating that this integral will be 
sharply peaked, we can approximate the integral by the maximum of the integrand so 
that 



max 

s 



-A(s) + (t s )(P( T -t s )-s) + ^t 2 



(D.9) 



If P(r) is slowly varying, we can expand P(r — t s ) ^ P(t) — j^t s and obtain 

*« ~ ~^iP ( D - 10 ) 

with the dominant s determined by the maximization condition. Substituting in and 
keeping terms to firs order in ^/f, we obtain the two conditions 

/ dP\ dA ,_ , 
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The fixed point obtained from ^ = is at P* = Q with Q essentially the same as 
obtained earlier. 

Note that s is determined, up to small corrections, by dA/ds = A/s = 7 so that the 
s that dominates the input near the shoulder is slightly larger than s. But this effect 
cancels in determining Q ps v'y + |s which agrees with the above for v <C s 2 as it should. 
The static condition, P = Q, has t s = s/v so that there is a —s 2 /2v term in Eq. (D.9). 
This condition for determining the pre-dominant s (and also Q) is essentially the same 
as that from Eq.(52) with the gaussian approximation for p*Q and also that of Good et 
al [12] , as discussed earlier. 

As long as jf~/ v i s small, the neglect of higher order terms and derivatives can be 
justified and we obtain the simple result 

£«$<«-n (D.i3) 

so that the fixed point is approached exponentially with eigenvalue e ~ /Q ~ 1/t sw . The 
stability backwards in time corresponds to the instability of the nose so it is controlled 
by the same eigenvalue e. 

In the high speed limit with v ^> s 2 , the analysis is somewhat more involved because 
the effects of mutations that cascade down to P — 0[(vs)s] and are carried back up to 
P are all important. However the basic result for the exponential approach to the fixed 
point with eigenvalue close to v/Q remains the same. 

When starting from a constant value of = ( at time T, until T — 7 ps T — r sw the 
dynamics is mostly linear — corresponding to deterministic for p(x, t) — in the regime 
of interest, and P(r) changes rapidly as given by Eq.(57). During a narrow time range 
around T — 7, P does not change much but a form of that is very similar to that 
at the fixed point is established below P. After this time, the dynamics is exponential 
relaxation of P(t) to the fixed point Q with the shape of r) close to its fixed point 
form but shifted by P(t) — Q. The results in the main text follow. 



